Building a Daily AI News Digest in the Ruins of RSS Feeds

AI is changing faster than any field I've worked in. Keeping up has meant going down X, LinkedIn or Reddit rabbit holes rather than getting on with my real work. What I wanted was simple: a single email at 9:30 AM with the latest AI news, organized by category, deduplicated, and ready to read. RSS feeds seemed like the obvious solution—they've been around for decades, they're built for this exact use case. But as it turns out, in 2026, RSS feeds are more like architectural ruins: impressive from a distance, but full of unexpected gaps when you try to actually use them.

I started with what seemed straightforward: collect RSS feed URLs from major AI sources into an OPML file. TechCrunch has an AI section, VentureBeat covers AI extensively, MIT Technology Review publishes regularly. About 20 feeds total spanning news, research, and company blogs. RSS has been around for decades—this should be trivial.

The first issue surfaced immediately: feeds that should exist simply didn't. VentureBeat's category feed returned a 308 redirect that Python's feedparser couldn't follow properly. I could fetch the redirect URL manually, but the automation broke. MIT Technology Review's topic-specific feed returned 404. These weren't obscure blogs—these were major tech publications with dedicated AI coverage.

Then came the Anthropic problem. Anthropic is one of the most important AI companies right now—they're behind Claude, they publish significant safety research, they announce major partnerships. But they don't have an RSS feed. At all. Their news page exists at anthropic.com/news, but there's no feed URL, no RSS icon, nothing. I checked the page source, looked for alternate links, searched their documentation. No feed.

The final frustration was arXiv. The preprint server has RSS feeds for each category, but they publish weekdays only. They even set explicit <skipDays> tags for Saturday and Sunday in the feed XML. When I tested my script on a Sunday, arXiv returned zero entries. The feed wasn't broken—it was just empty, by design, on weekends.

I'd heard about RSSHub—an open-source project that generates RSS feeds for websites that don't offer them. I spun up a local instance in Docker and started testing routes for TechCrunch, Reddit, arXiv, Twitter. Out of dozens of routes, only one worked: Hacker News. Everything else returned 503 errors. The explanation from GitHub issues: many routes require expensive API keys now, some are deprecated due to website changes, others fail because sites actively block scrapers. RSSHub isn't broken—it's a victim of the modern web.

The solution emerged from testing what actually worked. Native RSS feeds work reliably for some sources: TechCrunch's main feed, OpenAI's blog, Hugging Face, Google Research. These are organizations that still value open protocols—research institutions, tech companies building on open standards. RSSHub worked perfectly for one specific use case: YouTube channels. The routes for DeepMind, OpenAI, LangChain, Two Minute Papers all worked flawlessly. No authentication required, consistent format, reliable updates. YouTube technically has RSS feeds built-in, but RSSHub makes them easier to work with. For Anthropic specifically, I wrote a simple BeautifulSoup scraper that parses their news page HTML and extracts article titles, links, and dates. It's fragile in the sense that it'll break if they redesign their site, but it works reliably for now. Sometimes you just need to scrape.

The final architecture uses all three approaches: native RSS where available, RSSHub for YouTube, custom scrapers for high-value sources without feeds. I'm using MongoDB to track processed articles with a 7-day deduplication window—if I've seen a link in the past week, skip it. A cron job runs the whole thing daily at 9:30 AM, assuming my laptop is awake.

Every morning I get an email with four sections: AI News (TechCrunch, VentureBeat, Hacker News), AI Research (Google Research, Papers with Code), AI Company Blogs (OpenAI, DeepMind, Hugging Face, Anthropic), and YouTube (latest videos from channels that actually matter). All deduplicated, sorted by recency, ready to read before diving into the day. Setup took a Sunday afternoon of debugging. The code is messy—hard-coded paths, debug logging still in place—but it works, which in the world of RSS feeds and scrapers in 2026, feels like a minor victory.