Feeds For All With hAtom – Part 2: The Code

Recently I introduced the idea of adding an Atom feed to any document you want by using hAtom along with a local ‘proxy’ script to generate feeds to pages on your site that otherwise wouldn’t have them. The post seemed well received, but it didn’t feel complete to me without some code to allow people to quickly try it for themselves. So here’s the inevitable followup with an example PHP5 script to show how you can make the hAtom to Atom conversion transparent to a site visitor and add feeds to static pages or pages that otherwise don’t have a more typical Atom feed.

This certainly isn’t a robust script—it doesn’t account for the use of Tidy to manage non-valid or non-XHTML documents, multiple domains or different request methods. It works for my purposes and I left it at that. But the core idea is simple enough that it shouldn’t take long to re-write it for your needs or in your language of choice if they’re different then mine.

The PHP Proxy Script

<?php
ini_set('display_errors', '0');
header('Content-type: application/atom+xml');
//
// configuration
$h2axsl = '/home/user/app/hAtom2Atom.xsl'; // path to hatom2atom.xsl
$domain = 'example.com'; // domain with the hatom content
$permalink_stub = 'hatom2atom.php';// public path to this file
//
// parse request
$requested = $_SERVER['REQUEST_URI'];
$requested = substr_replace($requested,'',0,strlen($permalink_stub)+2);
$requested = urldecode($requested); // deal with encoded #
$docurl = 'http://'.$domain.'/'.$requested;
//
// grab file contents
if (function_exists('curl_init')){
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $docurl);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 5);
$file_contents = curl_exec&#40;$ch);
curl_close($ch);
} else {
$file_contents= file_get_contents($docurl);
}
//
// pass the file through the transform and send it out
$xsl = new DomDocument();
$xsl->load($h2axsl);
$inputdom = new DomDocument();
if ($inputdom->loadXML($file_contents)) {
$proc = new XsltProcessor();
$proc->importStylesheet($xsl);
$proc->setParameter('', 'source-uri', $docurl);
$newdom = $proc->transformToDoc($inputdom);
print $newdom->saveXML();
}
?>

The script can be broken down into the following parts:

  1. Configuration of paths for support files and requests.
  2. Extracting the location of the desired XHTML document that contains the hAtom content. This script is written so that requests made to hatom2atom.php/some/page would output the Atom feed for the document at http://example.com/some/page.
  3. Grabbing the contents of that XHTML document. In my case I’m making an http request to make sure the file is properly built/parsed, though if your files are static without includes you may want to use local file access methods instead.
  4. Performing the XSL transform and outputting the results.

You can copy and paste the above script, or download it as a .zip.

The XSL Transform

All the magic really happens via XSL and the trusty hatom2atom transform. So download both hAtom2Atom.xsl and uri.xsl and put them into the same directory somewhere on your server. When you do update the following line in the configuration portion of the script to the actual location of hAtom2Atom.xsl:

$h2axsl = '/home/user/app/hAtom2Atom.xsl'; // path to hatom2atom.xsl

Usage & The hAtom Document

Now that the proxy script is in place you should expose the feed to your visitors by including it in the head of the document or as links in the content. For example, if you have some hAtom content in http://example.com/page_with_hatom.html you can add the following link element in the head of that document:

<link rel="alternate" href="/hatom2atom.php/page_with_hatom.html" type="application/atom+xml" title="This Page's Atom feed" />

And any visitor with a browser that picks up on the link element would then identify the page as having a feed. If you want to be more specific about the location of the feed on the page, or you’re dealing with a document with multiple hAtom feeds and you want to specify the fragment ID for the root of the feed you’d do like so:

<link rel="alternate" href="/hatom2atom.php/page_with_hatom.html%23some_fragment" type="application/atom+xml" title="This Page's Atom feed" />

Notice I url encoded the # (as %23) when using a fragment id, this was because many user agents do not send the fragment ID as part of the request so I encoded it to make sure that the information was there for the proxy to manage and then had the script manage things from there.

Prettying up the Request

There are lots of ways to manage requests on your server – from MVC like dispatchers to mod_rewrite to nothing at all, in the case of this script I’ve just used the following rewrite rule that I added to the list of a number of rules I already had allowing me to pretty the url up a bit – and make it a bit more universal or future proof.

RewriteRule ^hatom2atom/(.*)$ hatom2atom.php/$1 [L]

But remember, if you do change the location of the proxy script to change the configuration line or url extraction methods in the script appropriately.

The Results

With the script installed and the feed linked up for users to find you now have Atom feeds for your readers to subscribe to that are generated from hAtom markup in a way that’s totally transparent to them. You can see this method in action in pages like:

Comments Temporarily(?) Removed