Be Your Own Aggregator: Create HTML from RSS Feeds

Wednesday, August 23rd, 2006 at 22:52 Wednesday, August 23rd, 2006 at 22:52

Incorporating dynamic content from other sites into your own can be as easy as downloading MagpieRSS and customizing the script below. RSS works really well as a data format on the web, both because it's simple to create (which means lots of RSS feeds on the web) and simple to parse. With a script like this, you can turn your latest links from del.icio.us or ma.gnolia into a box on your sidebar, or have the top 10 headlines each from CNN, BBC and Newsvine at the top of your page. If you wanted to be really slick, you could even use Tristan's post to create your own feed from a site without RSS, and then use this script to turn it into HTML and incorporate it into your site. Here is a perfect example: create an RSS feed that takes the daily "most viewed" clips from ytmnd or YouTube and use this to put it in your sidebar.

If I've got you convinced, let's move on to the code.

All the Hard Work Done for Us

The first step is to download MagpieRSS. It consists of a few PHP files; installation is a simple as copying them to your webserver.

Parsing the Feed and Converting it to HTML

Here is the basic script:

  1. #!/usr/local/bin/php -q
  2.  
  3. <?php
  4.  
  5. // Fetch del.icio.us Links: /includes/fetch_delicious.php
  6.  
  7. // This PHP script fetches the most recent del.icio.us links and creates a static
  8. // includes file out of them. It should be scheduled to run several times a day.
  9.  
  10. require_once "rss_fetch.inc"; // Include the MagpieRSS Library.
  11.  
  12. $url = 'http:/ /site.com/rss/';
  13. $output_file = "links.inc";
  14. $limit = 10;
  15.  
  16. $rss = fetch_rss($url);
  17.  
  18. $output = "<div><h3>Feed from Site.com</h3>";
  19. $output .= "<ul>";
  20.  
  21. $n = 0;
  22. foreach($rss->items as $item) { // Parse each <item> section.
  23.  
  24. if($n++ > $limit) break;
  25.  
  26. $href = $item['link'];
  27. $title = strip_tags($item['title']);
  28. $desc = strip_tags($item['description']);
  29. $output .="<li><a href=\"" . $href . "\">" . $title . "</a> " . $desc . "</li>";
  30. }
  31.  
  32. $output .= "<li>Last updated on " . date('j M Y at H:i', time()) . "</li>" .
  33. $output .= "</ul>" .
  34. $output .= "</div>";
  35.  
  36. // Write the output to a file.
  37. $filehandle = fopen($output_file, 'w');
  38. if(!$filehandle) trigger_error("Could not open file for writing: " . $output_file, E_USER_WARNING);
  39.  
  40. fwrite($filehandle, $output);
  41. fclose($filehandle);
  42.  
  43. ?>
#!/usr/local/bin/php -q

<?php

// Fetch del.icio.us Links: /includes/fetch_delicious.php

// This PHP script fetches the most recent del.icio.us links and creates a static
// includes file out of them. It should be scheduled to run several times a day.

require_once "rss_fetch.inc"; // Include the MagpieRSS Library.

$url = 'http:/ /site.com/rss/';
$output_file = "links.inc";
$limit = 10;

$rss = fetch_rss($url);

$output =  "<div><h3>Feed from Site.com</h3>";
$output .= "<ul>";

$n = 0;
foreach($rss->items as $item) { // Parse each <item> section.

	if($n++ > $limit) break;
	
	$href = $item['link'];
	$title = strip_tags($item['title']);
	$desc = strip_tags($item['description']);
	$output .=	"<li><a href=\"" . $href . "\">" . $title . "</a> " . $desc . "</li>";
}

$output .= "<li>Last updated on " . date('j M Y at H:i', time()) . "</li>" .
$output .= "</ul>" .
$output .= "</div>";

// Write the output to a file.
$filehandle = fopen($output_file, 'w');
if(!$filehandle) trigger_error("Could not open file for writing: " . $output_file, E_USER_WARNING);

fwrite($filehandle,  $output);
fclose($filehandle);

?>

A brief explanation. Line 1 lets the script be treated as an independant executable. This is important later on, when we use cron to run it periodically throughout the day. Line 10 brings in the MagpieRSS code, and line 16 uses that code to fetch and parse the feed. Then we just loop through the object that it creates with a foreach loop. The last few lines output the HTML to a file. Then you just have to include this file in your pages. Something like would do the trick.

This can be customized in many ways depending on the feed. I have it setup to parse my del.icio.us feed and convert it into a list of links, but you could just as easily parse paragraphs of text or links to images.

Scheduling It

We could always have this script run dynamically, and have it create new HTML every time someone visits your site, but this is a bad idea for several reasons:

  1. If your site gets even a modest amount of traffic, you would almost definitely end up getting blocked, or at least throttled. You would be requesting the RSS feed once for every pageview you got, and angering the very people you are depending on for the feed.
  2. It would slow down page load by the roundtrip time between your server and the feed's server.
  3. Parsing the entire feed (or feeds) for each pageview would probably kill your server, or at least get an upset email from your web host.

All of these problems can be solved by scheduling the script to run periodically. If you wanted it to stay really up-to-date, you could even have it run every 5 or 10 minutes. cron is the perfect tool for this, and can be found on all Unix servers (and a lot of good web hosts will give you access to it). A good tutorial for cron can be found here. Below is the crontab I use to run my scripts:

  1. 0 11 * * * /path/to/fetch_delicious.php >/dev/null 2>&1
  2. 0 17 * * * /path/to/fetch_delicious.php >/dev/null 2>&1
  3. 0 23 * * * /path/to/fetch_delicious.php >/dev/null 2>&1
0  11  *  *  *  /path/to/fetch_delicious.php  >/dev/null  2>&1
0  17  *  *  *  /path/to/fetch_delicious.php  >/dev/null  2>&1
0  23  *  *  *  /path/to/fetch_delicious.php  >/dev/null  2>&1

I find that running it three times a day is more than enough for my needs, but you could schedule it to run as often or as seldom as you like.

That's it, let me know how this works for you.

Comments

Rik Catlow:Friday, August 25th, 2006 at 9:33 #1

I've been using Magpie for my two aggro site. I wrote a dispaly script for the front end that passes an argument to enable you to display rss feeds with ul, ol, or dl lists elements and the amount of headlines.

Check' em

http://www.dontmeetyourheroes.com
http://www.macscour.com

Malcolm:Monday, August 28th, 2006 at 10:50 #2

You can also use rss2html it is a free script quite easy to use:
http://www.feedforall.com/free-php-script.htm

Rodolpho:Tuesday, August 29th, 2006 at 7:33 #3

Do you think it's possible to build something like Popurls.com using this script?

Comments Disabled


Recent Posts

Navigation