A couple of years ago, I went looking for an RSS reader. For those not familiar with the concept, an RSS reader is a piece of software that maintains a list of blogs, pulls their RSS feeds, and displays a list of articles in them. And, if the blog maintainer puts the full text of his articles in the feed, it lets you read them. A decent RSS reader also remembers which articles you’ve read, and either marks them accordingly or only shows you the new ones. For subscribing to blogs that are only occasionally updated (like this one), an RSS reader is almost a necessity: it does the boring work of repeatedly checking for new articles when there seldom are any.

My criteria were:

  1. Good integration with the web browser. I don’t want to flip back and forth between two different programs, one to read the RSS feed, and another to read the things on the web that it points to. I also want hyperlinks in the RSS feed to have the usual color indicators of whether or not that I’ve already read them, which probably won’t work if the RSS reader is a separate program, unless someone makes extraordinary effort at browser integration. Thus the RSS readers I’d tried previously were Firefox extensions. But I didn’t particularly like any of them, because I wanted:

  2. Something better than the usual three-panel view (one panel for a list of blogs, another for a list of articles in a blog, and a third for the article itself). For one thing, that layout requires a lot of clicking: you typically have to click on each blog, then on each article. I can go through blogs faster than that just using the normal browser features: middle-click on each blog in a list to open it in a new tab, then hit Control-W to close each tab when I’m done with it. Also, the three-panel layout wastes a lot of screen real estate: I’m going to spend 99% of my time reading the articles, yet those appear in only one of the three panels. That’s annoying even on a desktop-sized screen, let alone tablets or phones.

  3. No web-based services. I prefer to be in control.

What I fairly quickly landed on was mtve’s program RSSaggressor. This is a Perl script that takes a list of blogs (a plaintext file, one URL per line, containing the URL of an RSS or Atom feed), checks each, and spits out a long HTML file containing everything new in every blog. The user then views that HTML file in a web browser. This way, instead of all the mouse clicking one does with a three-panel reader, reading the updates is just a matter of scrolling. (Well, except if the blog chooses not to put the full text in the RSS feed; then it’s back to the ‘middle-click to open each article in a new tab’ procedure.) This way there are no browser integration issues, aside from running the program in the first place and then opening its output HTML file in the browser. The original author runs the program as a cron job; I run it whenever I feel like checking what people have to say.

I’ve also made several changes, the notable ones of which are:

1. It displays the author of each article, which is useful for multiple-author blogs.

2. It allows you to specify a condition which has to be met for each article to be added to the output HTML file. To use this, after each URL in the list you add a condition, which is distinguished from the URLs by being indented. For instance, to subscribe to the Volokh Conspiracy weblog, but only to posts by Orin Kerr or Eugene Volokh, not the twenty or so other “co-conspirators”, you can write:

http://volokh.com/feed/
        $author eq "Eugene Volokh" or
        $author eq "Orin Kerr"

The syntax for the condition is Perl syntax; the code uses Perl’s eval() function to evaluate it. The accessible variables are $author, $title, $link (a hyperlink to the web version of the post), and $text (its full text, if you’re lucky). And since the condition can be arbitrary Perl code, you can also do other things with it, including changing any or all of those variables. For instance, Twitter hashad an RSS API, until they decided that offering RSS wasn’t predatory enough, but in it they didn’t publish hyperlinks as hyperlinks, just as plain text. To enable such hyperlinks (or at least nine tenths of them), you can write, for instance (fill in the blank with the screen name of the person to follow):

http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=___
        $text =~ s((https?://[a-zA-Z0-9_/.+]*[a-zA-Z0-9]))
		   (<a href="$1">$1</a>)g ; 1

(If those last two lines look like gobbledygook, welcome to the world of regular expressions.)

This version of RSSaggressor, like the original, can be found on Github. If all my modifications seem to date from the last few days, that’s because I was using an older version of the program previously, and ported my changes to this newer version.

One odd thing about RSSaggressor is that it remembers whether you’ve read articles by storing their MD5 checksum. Thus if the author goes back and edits the article, you get to see it again in its entirety. This has good aspects (you get to see updates to articles you’ve already read) and bad ones (you get to see the whole article again just because the author corrected a typo). I’d prefer to see some sort of diff between the old and new articles, but haven’t found a good diff library that operates on HTML and plays well with Perl. (Suggestions are welcome; patches or pull requests are even more welcome.)

This program hasn’t been packaged up for the casual user who doesn’t know to pull programs from github or install Perl modules (or, on some systems, install Perl itself). Those are not complicated things to do, but instructions would vary depending on the system. (As regards Perl modules, this version of the program might not need any extra ones: the ones it uses are pretty basic, and might already be there.)