Ruby on Rails Feed/RSS Aggregator (35 lines)
Posted by Simon on December 07, 2008 at 03:17 AM
Categories: code, rails
I wrote myself a feed aggregator for my front page. And... voila! I'm finally satisfied with it to post it.
For me I run this as a standalone rails app, separately from my weblog. You could do that (and redirect requests to / or /index.html with Apache or nginx/etc. Or you could integrate it into your own app. Up to you.
Features:
- Will aggregate ANY feed, no matter how badly mangled by the creators, using FeedTools (I also tried feed_normalizer and simple rss but they're not as good)
- Deals with slowness of downloading feeds, RSS, etc., and REXML by caching
- Deals with need to recache using elegant http/cron periodic system
- Display the feeds in a facebook-like news feed format, sorted by dated.
- You can easily re-label the feeds, add and renew feeds (in the code)
- Only 35 lines of controller code!
The heart of it is the controller, obviously. The best thing? It's only one page of code! Ruby rocks!
require 'feed_tools'
class PortalController < ApplicationController
layout 'site'
# Instructions: 1. Change @@secret. 2. Add a cron job to regularly call /?recache=yes&secret=XXXXXXX
# This is a feed aggregator that uses FeedTools because it handles practically any feed.
# But FeedTools is super slow in every way so this aggregator stops using it as soon as possible.
# TODO add XML feed output
@@secret = "change_this" # change this to protect your site from DoS attack
# The array of feeds you want to aggregate. If you change this then manually delete the whole cache.
@@uris = ['http://simonwoodside.com:8080/posts/rss', 'http://simonwoodside.com/comments/rss',
'http://semacode.com/posts/rss',
'http://api.flickr.com/services/feeds/photos_public.gne?id=20938094@N00&lang=en-us&format=rss_200',
'http://api.flickr.com/services/feeds/activity.gne?user_id=20938094@N00']
# A map between the "official" feed titles in the XML, and the titles you want to show when rendered.
@@title_map = { "Simon Says" => "Simon Says:", "Simon Says: Comments" => "Simon Says comment:",
"Uploads from sbwoodside" => "Flickr picture:", "Semacode" => "Semacode blog post:",
'Comments on your photostream and/or sets' => 'Flickr comment:' }
def index
if params[:recache] and @@secret == params[:secret]
cache_feeds
expire_fragment(:controller => 'portal', :action => 'index') # next load of index will re-fragment cache
render :text => "Done recaching feeds"
else
@aggregate = read_cache unless read_fragment({})
end
end
private
# This will replace cached feeds in the DB that have the same URI. Be careful not to tie up the DB connection.
def cache_feeds
puts "Caching feeds... (can be slow)"
feeds = @@uris.map do |uri|
feed = FeedTools::Feed.open( uri )
{ :uri => uri, :title => feed.title,
:items => feed.items.map { |item| {:title => item.title, :published => item.published, :link => item.link} } }
end
feeds.each { |feed|
new = CachedFeed.find_or_initialize_by_uri( feed[:uri] )
new.parsed_feed = feed
new.save!
}
end
# Make an array of hashes, each hash is { :title, :feed_item }
def read_cache
@@uris.map { |uri|
feed = CachedFeed.find_by_uri( uri ).parsed_feed
feed[:items].map { |item| {:feed_title => @@title_map[feed[:title]] || feed[:title], :feed_item => item} }
} .flatten .sort_by { |item| item[:feed_item][:published] } .reverse
end
end
It's actually pretty simple but it took me a while to get the balance just right. What you need to do is set up a cron job or other repetitive task that does an HTTP load on http://mywebsite.com/?recache=yes&secret=XXXXXXXX ... every once in a while. You can use wget or curl, or whatever. You might want to recache every minute, five minutes, hour, whatever. Since it's done as a part of the controller there's no nonsense about running backgroundRB, RubyCron and all the other nonsense at HowToRunBackgroundJobsInRails. Yay!
Here's the view:
<div id="feed-stream">
<% cache do %>
<%
lastday = -1
@aggregate.each do |item| %>
<div class="item">
<%
mydate = item[:feed_item][:published].getlocal
if mydate.yday != lastday
%><div class="item_details"><p style="text-align:right"><%= mydate.strftime('%A, %B %e') %></p></div><%
lastday = mydate.yday
end
%>
<div class="item_content">
<%= item[:feed_title] %>
<a href="<%= item[:feed_item][:link] %>"><%= item[:feed_item][:title] %></a>
</div>
</div>
<% end %>
<% end %>
</div>
My cache is all Hashes. I don't cache the FeedTools object because I discovered that even after FeedTools has parsed your feed, accessing the supposedly "final" data is incredibly slow (like maybe 10x or 100x slower than a hash).
Here's the model:
require 'feed_tools'
class CachedFeed < ActiveRecord::Base
validates_presence_of :uri, :parsed_feed
validates_uniqueness_of :uri
serialize :parsed_feed, Hash # note that if this exceeds a certain KB size, it will likely fail (thinking it's a String)
end
And the migration:
class CreateCachedFeeds < ActiveRecord::Migration
def self.up
create_table :cached_feeds do |t|
t.column :uri, :string, :limit => 2048
t.column :parsed_feed, :text, :limit => 128.kilobytes # use for serialized object
t.timestamps
end
end
def self.down
drop_table :cached_feeds
end
end
Well, that's all you need. When I started out to make this I thought I'd find a simple example out there but there wasn't anything. It turns out that there's a number of interesting challenges — picking a parser to deal with difficult feeds, XML, and malformatted XML... to deal with caching ... to deal with background processing. Took me a while to get it all just right.
It powers my own front page ... consider to be under standard ruby open source license. As the vending machine says: Share And Enjoy!
Hi there.
Well, I'm back. I was running this site on really ancient technology — AxKit — so 2001. Now I'm running it on modern technology, i.e. Rails 2. And doesn't it rock. Now I have a cool GUI editor to type into, I have easy programming in ruby, and I have of course polished both my design and my CSS/XHTML skillz considerably in the mean time, hopefully making this all easier to look at and navigate.
So I'm running on SimpleLog here, but it's not "stock". Oh no. Stock SimpleLog right doesn't run on Rails 2, but this one does. Also, I made it even MORE simple than it used to be:
- Support Rails 2.0 (no need to freeze an old rails)
- no themes—annoying to use anyway, and no one was publishing themes either
- replaced the editor/preview panel with WYM on Rails, which is by FAR the best WYSIWYG / GUI editor I've ever found, and the end of a long search for me
...and so on.