Another RSS bandwidth reducing technique
I was reading this post by Tristan Louis the other day [via Regular Sucking Schedule]. One of the techniques Tristan describes for reducing RSS bandwidth is to make the autodiscovery version of your RSS feed be a summary feed, but then provide a full feed for power users which they have to find manually.
This lead to me thinking, why not do something similar with poorly written RSS aggregators? If RSS aggregators do not give a If-None-Match or a If-Modified-Since header, why not penalize them with an excerpt version of the feed? Well behaved aggregators get the full feed.
Well, it turns out this is really easy to implement in Wordpress. So easy in fact that I whipped it up real quick, so it works, but it’s Wordpress specific, a kludge, and doesn’t really fit into the whole Wordpress ethos as far as variable names and configuration goes, so I’m not going to put it up here. On the off chance you’re interested anyway, leave a comment or send me an e-mail and I’ll get back to you.
The only downside to my method is that all readers fetching my feed for the first time will get the truncated feed, but on subsequent fetches, the well behaved aggregators will get the full feed, so it will look like all of the old articles changed. But I’m willing to live with that.
I only have 10 or so people who subscribe to my RSS feed, so this is not a huge issue for me. But other blogs which get much more traffic than mine does might be interested in implementing something similar on their sites.
December 20th, 2004 at 11:48 pm
Give Poor Aggregators Less
John Wilson has a great idea about dealing with bad aggregators: give them less. He has reconfigured his copy of Word Press, blogging software, to provide tiny summaries for aggregators that fail to offer up the right HTTP headers that demonstrate the…
December 21st, 2004 at 11:31 am
This is a great idea and something similar to what I was considering given the recent articles that sparked my interest is bandwidth optimization of RSS. One thing I would do in such a case is hard-code a reference to a post/article about “well behaved RSS Aggregators” and why a headline-only feed is being served, etc.
I do have a couple of questions/concerns though. What actually governs whether the proper headers will be sent back? I mean, some aggregators store everything in what amounts to a database, but others I assume will “cache” the info more like a web-cache. So what governs how long this info persists so that when the user revists (say after vacation) the If-Modified-Since and If-None-Match are again passed to retrieve the full RSS posts? Anyone know of a feature matrix of aggregators or RSS readers that support these features? Or a badly behaved RSS reader I can test with?
December 21st, 2004 at 12:19 pm
So you’re looking for programs to test with, if I understand you correctly. Hmm. I curled up in front of the fire with a copy of the HTTP/1.1 spec, crafted some test cases, and blasted them at the server by hand. Later I put in some logging code on my server to log what headers I’m getting from RSS clients themselves.
Alas, I don’t have a very big subscriber base to get stats from. As the only example of a “bad” feed reader I’ve found, it appears that FeedOnFeeds 0.17 is well behaved, but FeedOnFeeds 0.18 is not! For some reason FOF 0.18 isn’t sending me any nice headers.
Here’s a workaround using Sharpreader. The first time Sharpreader hits a feed, it doesn’t pass any nice headers, it just grabs it. Paste the URL of your feed in Sharpreader and hit enter, and that’s the example of a “bad” feedreader. Then subscribe to the feed using SR. Add something to the feed and manually refresh it. Bam! there’s your example of a well behaved feed reader. Delete the subscribed feed from SR and start again. (Note that you need a feed you don’t care about to test this with (because you have to keep editing and adding things to it to get Sharpreader to grab it on the second try.))
Nothing really works as well as writing your own program to manually send headers though, because then you can really test every case.
December 21st, 2004 at 1:40 pm
All versions of FoF are well behaved assuming they’ve been installed and configured correctly in part because they build on Magpie which ships with conditional GET, and gzip encoding support.
December 21st, 2004 at 2:31 pm
Okay, let me rephrase that. This is all the headers I get from one of my RSS users:
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Accept-encoding: gzip
Host: crazybutable.com
User-Agent: FeedOnFeeds/0.1.8 (+http://minutillo.com/steve/feedonfeeds/)
This is probably a problem with their setup, as all of my FOF 0.1.7 subscribers are sending me nice headers.
December 21st, 2004 at 3:05 pm
OK, so Ican do some manual testing of the headers myself. But does anyone have any insight into what determines how long client cache this sort of info?
December 22nd, 2004 at 7:48 pm
Ok, but this means that first-time visitors will not receive full text feeds. No matter how well behaved, an aggregator cannot send those headers on the first visit. Do you really want to penalize new readers like that? I like the idea, but maybe start doing it after a pattern of abuse.
December 22nd, 2004 at 8:20 pm
Yeah, I’ve thought about that, and I’m okay with it. The thing is, the very first time they get a new update, all of the old summary articles will change to full feed articles. Yes it’s a bit of a pain in the ass, but I think people can deal with it.
If an RSS aggregator wanted to be a smart ass, on the very first fetch it could pass in a If-Modified-Since header with a date like, oh, March 1, 1989 (the month and year that Tim Berners-Lee put forth the first paper outlining what would become today’s web). That would prove that the RSS reader was well behaved, as well as having a sense of humor.
December 29th, 2004 at 8:44 am
Yeah, I agree that serving limited content on the first-time hit is not so bad, and I would propose to programatically include an entry explaining the issue and encouraging the user to use well-behaved RSS readers. My concern with doing this related to the potential for the last-checked and eTags timing out or being cleared from cache by the the RSS client. Is this a concern at all? Does anyone have any insight into how RSS readers work in that regard?
Thanks for bearing with me - It’s great to come up with countermeasures for abuse, but I am always very cautious as to not hinder the legitamate users in the process.