Boost, RSS Feeds, and Google Reader19 Sep 2011 in Drupal
For a while now, I've struggled with an issue on this site. Google Reader would sometimes show items that had already been displayed in the reader. They would be shown as new unread items, regardless of whether the "original" copy of that item had been read. I'm sure this irritated many readers, and I tried several times to fix the issue.
- The feed was successfully validated by the W3C Validator. Multiple times.
- Adding the feed freshly worked fine.
- Adding the feed to other RSS readers showed only 1 per item.
I set up a cron job to pull a copy of my RSS feed regularly and save copies. I figured I could see if anything changed between versions. At first, the differing versions showed no significant changes. (Other than new posts where expected.)
At one point, I got a clue from a fellow ALE-NW organizer that the feed was showing duplicate items. Looking at the view that generated the feed, I realized each tag was causing a duplicate entry. I deleted the offending relationship, and the number of entries got better. I figured I had the Google Reader issue fixed.
This weekend, it sprung its head again -- duplicate entries! I looked back at my cron-based RSS archive and discovered that there were differences in some of the files! As I looked at the differences, I felt like an idiot.
The first file contained:
<guid ispermalink="false">149 at https://systemoverlord.com</guid>.
Another file contained:
<guid ispermalink="false">149 at http://tuxteam.com</guid>.
I realized, as I read the differences, that Drupal bases its "base URL" on the URL that is used to access the site. (I used to use tuxteam.com.) This isn't normally a problem, because the RSS reader would be accessing it via the same domain every time, but once you're running Boost, you can get different domains from the cached copies of the files! So, if the cached RSS feed expires and Boost builds a new one on an access from tuxteam.com, the subsequent systemoverlord.com access by Google Reader returns a feed with tuxteam.com-based URLs. These guids are different, and so Google Reader believes they're different articles!
I've now set
$base_url = 'http://systemoverlord.com'; in my settings.php. I believe this should finally, permanently, put the duplicate item bug to rest.