Please note: This is an unpublished site and we are making changes - glitches still!!!

RSS feed reader module

RSS feeds are simple XML files that in most cases contain article titles, external links to news articles or pages and in the "ingress" information about a news article. Here is a simplified example of a typical RSS feed structure
 
rss_example.xml
 
If you want to have a look at a full version you can download one from http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/scotland/rss.xml.
However, using a standard browser may result in the browser converting the output to a “news page” so you should have a look at the source code. From Linux/Unix you can download the full page using the wget utility:
 
wget http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/scotland/rss.xml
 
We wish to present the user a list of news items with the news title as the visible link text and the link to the news item as the real URL. In HTML it might be something like this:
 
<a href="http://news.bbc.co.uk/go/rss/-/1/hi/scotland/7615591.stm" >
   Scottish red kite numbers soaring</a>
<br />
<a href="http://news.bbc.co.uk/go/rss/-/1/hi/scotland/7615429.stm" >
   Labour launches new policy drive</a>
<br />
...
 
As you can see from the XML file the repeating body of news items is placed in an <item> tag. Because we are mostly interested in the links the only thing we need is get the attributes title and link from them and have our module output them in the proper way.

The RSS feed reader module

The code for the module is pretty simple. While there are quite a few RSS reader classes available we will write one here. Why? Because it is a simple task. We will use the built-in PHP XMLReader to extract the data we need.
 
RSSreader_module.php
 
Save this file to modules/RSSReader/RSSReader_module.php

The template for the module

At it simplest this template might be something like this:
 
rss.tpl
 
Save this to templates/rss.tpl 
 

Template inheritance

As you can see we are extending a another template here. The base template is stored in the  main template directory and it looks like this:
 
small_module_base.tpl
 
You can easily see that there are named blocks that we can override in templates that inherit the template definition. The rest of the parent template will be used as such which fits perfectly in our 'small sidebar module'  ideology.
 
Note that we are looping through the items array using the foreach template control structure. Each item is accessed using local variable rssitem much the same way one would do it in PHP.
 
After saving the module add remove Smarty comments surrounding the rssreader declaration in the sidebar template.
 
This is basically all you need to do. To make things prettier you can change the template XHTML and tune your CSS. You can, of course, add parameters that change the behaviour of the module or add data to the template. Some of the things that we might add are
  • Date and/or time of the news item (parse as needed)
  • Tooltip for the link, text taken from the "description" field. There is an example how to achieve this goal in the site news template and the CSS definition.

Getting a local copy of the feed

There is one thing, however, I want to warn you about. The server software reads the RSS feed each time the page is accessed. Now any problems in the RSS feed access will cause problems in our page output. In a production environment it is advisable to write a small scheduler job that stores a fresh XML locally every few minutes or so. A local file is a lot faster to access. In Linux and *BSD based systems you can write a small cron job that looks something like this:

# m   h dom mon dow command 
 */5  *  *   *   *  /usr/bin/wget http_rss_url -O /tmp/bbcscotland.rss

This cron task reads the RSS every 5 minutes and saves the result to the file specified. The module can then be told to use the local file. Please note that this is just a hint and you must know what you are doing instead of just copying the stuff. If you are using Windows you might use the Windows scheduling tools for the same purpose. There is a wget utility for the Windows environment as well.

One more warning: Note that our simple module has virtually no validity checking. Therefore  the cron job should actually be a bit more elaborate and check the validity of the feed. It is advisable to write a small script that does the job well. If we move the validation code to the cron job the module will be more simple and use less resources. 

Ideas for further improvement

Another - maybe even better - alternative might be to create a heap/memory type database table and have the cron job parse the feed and save it to the table. After a MySQL daemon restart the data would be gone, of course. This is why your module should check the number of rows in the table and if it is empty re-fetch the feed and fill the table. Not too complicated!

Another obvious idea is to use Smarty caching for the RSS template.

Now hastily to the site news functionality.