RSS Feed Aggregation
jamesjyu
created: 2006-03-03 12:52:28

I've written a rudimentary RSS feed reader in perl, but, I want to take advantage of ping services so that I don't need to check every feed to see which ones have updated. I know that there are ping services like blo.gs and weblogs.com, but I'm unsure about how I can leverage those. Can anyone point me to a site explaining how this is done?

Also, if there already is a perl module that will do all this for me, by all means mention that.

Finally, I'm also trying to get the full text from a post, even when the RSS feed only has partial text information. Right now, I'm just regexping the actual html page (which is a bit cumbersome, because I need to know what the structure of the page is like, and if the structure changes, I'm screwed). Anyone have a better idea for this?

Thanks,

James

Re: RSS Feed Aggregation
created: 2006-03-03 13:39:24
Also, if there already is a perl module that will do all this for me, by all means mention that.
Did you look on http://search.cpan.org for "Ping"? do any of those suit your needs instead of the third-party services you mentioned?
Re: RSS Feed Aggregation
created: 2006-03-03 13:59:40
The ping feeds are _not_ meant for people looking to be notified of updates to only a small subset. Your best bet is to use Bloglines' (WebService-Bloglines) or NewsGator's apis to pull updates to your subscriptions. RSS aggregation sucks a lot of bandwidth, so if you're not careful, you might find your IP blocked. Definitely try to use an existing aggregator (Plagger looks interesting). Pulling feeds efficiently, and respectfully, is more involved than simple HTTP fetching.

As for full content, look to see if a third party is already doing that for your feed. For instance, I subscriber to alterslash, instead of slashdot.

Re^2: RSS Feed Aggregation
created: 2006-03-03 14:03:33

Writing your own is not that hard through. A 5-minute-tutorial is available in HTTP Conditional Get for RSS Hackers, which is an absolute must-read for anyone writing feed aggregation code.

Makeshifts last the longest.

Re^3: RSS Feed Aggregation
created: 2006-03-03 14:28:51
Thanks, I think I will give that a try, and will also look into Plagger. Although, if Plagger does the same thing as conditional gets, I don't see why I can't just hack it up in 5 minutes using conditional gets in my existing code.
Re^4: RSS Feed Aggregation
created: 2006-03-03 20:11:57

Thanks for your interest on Plagger. Plagger is a pluggable aggregation platform where you can plug components just like LEGO to build your own aggregator.

As for smart GET (conditinal GET), the latest version in our svn repository supports it by using the default aggregator component (Aggregator::Simple). I'm planning to release the CPAN version in this weekend.

--
Tatsuhiko Miyagawa
miyagawa@cpan.org

perlmonks.org content © perlmonks.org and Aristotle, davidrw, frenchtoast, jamesjyu, miyagawa

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03