Firehose of Data
April 28, 2026 · Rich Costello

I’ve always been intrigued by the term “drinking from a firehose.” That feeling of massive overwhelm where there is too much, too fast, and you don’t know where to start. Such a case could be said of the Internet Archive API. It’s a massive amount of data, spanning a vast array of information over a several-hundred-year timeline.
I’ve always been a fan of the Internet Archive. It’s a multimedia library of culture and entertainment, from a vast collection of Grateful Dead and jamband shows to obscure movies, books and cultural nostalgia(I like the old TV Guides from the 70s). The Internet Archive claims to have over 1 trillion archived pages—hence the term “The Firehose of Data.”
When I started building the IchingPortal working with the Internet Archive API was top of mind. However, unlike Reddit or Giphy, this integration was tricky. The API can be slow—especially with video—and keywords are often buried deep within complex data structures. My first attempt with the "Magazine Rack" failed to sync effectively with hexagram meanings, and other categories like classic TV and audio were too vague. Unlike previous integrations where capturing the context of a given hexagram was achievable with a meme or gif, this data was harder to work with but I was determined to sculpt something out it.
I kept digging, trying different categories and endpoints until I hit the cable news feeds. While Fox News and CNN provided the right "data-tainment" potential, a new hurdle emerged: the video wasn't in sync. The Archive serves news in 60-minute chunks, while I needed specific one-minute clips to match the transcripts.
With help from Claude AI, I discovered GDELT (Global Database of Events, Language, and Tone). This global tracker monitors broadcast transcripts in real-time. By pairing GDELT’s keyword indexing with the Archive’s video library, I could finally sync specific clips to the IchingPortal’s hexagrams and line meanings.
Here are a few examaples
The results are often uncanny—the media narratives match the hexagram meanings with surprising accuracy. While GDELT is currently migrating servers (cable news video updates stopped updating in October 2024) and the API remains slow, the integration works. I’ve added a progress bar to help with the load times, but even in this prototype phase, the "data-tainment" value is clear.
All in all, I think this integration has a lot of potential. It’s still in the prototype phase, but it already does a great job of syncing ancient wisdom with modern data. Hopefully GDELT will start archiving current news clips again. Though it’s interesting looking at the context of hexagram meanings through a past filter, real time feeds would be nice. I can envision building some type of component that would compare and contrast the old clips with the new. I hope GDELT can restart their archiving mechanisms in conjunction with the Internet Archive. They have a good system set up. More to come!

