Thursday, February 21, 2008

Google Calendar, RSS, ATOM, XML, XSL, XPATH, Namespaces and Firefox Bugs

I'm not quite sure how such a simple request spiralled out of control, plunging into buckets of alphabet soup . . . but I need to write something down. By Monday morning I'll have completely forgotten what the hell I was doing today. :-)

It all started simply enough: Updating a rusty old business process.

The old process:
1. Business user creates/updates a timetable spreadsheet.
2. Business user emails spreadsheet to web team.
3. Web team converts spreadsheet to HTML cuts/pastes/styles and generally beautifies content.
4. Web page is updated with new Timetable HTML.

This of course is a very 90's way of running a website, so we were discussing alternate ways of updating and automating this process.

We settled on the idea of creating a Google Calendar for the Timetable. The interface is simple for the business users, not much different than their Outlook calendar. The owners can set up their recurring appointments, and when they need to reschedule something around an event or shift a time, it's instantly live.

On the web team side, we just needed to pull down the RSS XML from the public address of the calendar, style it with XSL, and output the same HTML table view that the users know and love.

New Process
1. Business users maintain Google Calendar
2. RSS, XSLT and CSS magically produce an HTML Timetable for the site.

And so it began . . .
I set up a public Google Calendar, entered a few test events, and downloaded the Atom feed Xml.
I posted the Atom XML on my local web server.
I opened the XML in my browsers . . Firefox shows me the feed subscription page, IE6 shows me raw XML . . . so far so good.
I created a rudimentary XSLT and inject it into the top of the XML file.

<?xml-stylesheet title="XSL_formatting" type="text/xsl" href="wtf.xsl"?>

This displayed my "Hello XSLT" in IE6, but Firefox still shows me the default subscription page. I figure I did something dumb, or not quite standards compliant, and set off happily developing and testing in IE6 (back to this later).

Now I started putting some XPath pattern matching into my XSLT. This didn't go so well.

An RSS/Atom doc is pretty simple, it's really just a 'Channel' or 'Feed' element filled with either 'Item' or 'Entry' elements.

So I was surprised when none of my XPath worked. No combination of

/ //Entry /Entry Feed/Entry /Feed or //Feed

selected any content.

did work, and when I did
<xsl:value-of select='name(.)'

on the current node, I got back "Entry" and "Feed".
Grrr . . .

Eventually, I decided that it had something to do with namespaces, particularly the default namespace.

<feed xmlns=''

Google turned up all sorts of noise around how to pull out namespaces using XSLT. I was distracted for a while trying to pull out the default namespace programatically, and tack it on to my XPath patterns. In the end, it turned out to be quite simple. I just needed to declare a prefix for the default namespace in my XSLT file:

<xsl:stylesheet version="1.0"

This page pretty much solved it for me:
XPath and Default Namespace Handling;

Embarassingly, this article from back in 2001 would also have shown me the light:;

Of course this all would probably would have been easier if I used a tool other than Notepad as my XML/XSLT/XPath IDE. :-)

In any case, a very simple little XSL pulls all the Entry Node contents out of an ATOM Feed:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
<xsl:template match="/">
<head><title>Google Calendar Test 2</title></head>
<xsl:apply-templates select="//atom:entry"/>
<xsl:template match="atom:entry" >
<h2><xsl:value-of select="name(.)"/></h2>
<xsl:value-of select="." disable-output-escaping="yes"/>

This gave me a ugly but accurate display of the timetable info we were after . . . at least in IE.

I popped over to Firefox to take a look at it, and discovered that Firefox still displayed the same old default subscription page. I fiddled with pathing and file names for just a bit before deciding that it was something stranger than me fat-fingering a path.

It turns out that Firefox does indeed ignore your stylesheet if the phrase "RSS" or "Feed" appears in the first 512 bytes of your XML file!

Firefox 2.0 breaks client-side XSL for RSS and Atom feeds

BugzillaBug 338621 – Feed View overrides XSLT stylesheet defined in XML document


Ah well, I did get a good laugh out of one of the workarounds in the comments:

"The emerging workaround for this problem (which isn't new to us, since we're
using the same heuristic that IE7 betas have been using for months) is to put
in a comment ranting about the evils of sniffing web content and overriding the
desires of web developers which is long enough to move "<rss" or "<feed" out of
the first 512 bytes, since that's all we sniff.

Then, just to kick some more sand in my face, I caught this out of the corner of my eye as I was closing up my browser windows for the day:

"The Firefox pitfall

All would be good in the land of RSS and Atom if Firefox had support for the disable-output-escaping feature in XSLT but it does not.

disable-output-escaping is an obscure feature in XSLT that serves only one purpose: it processes tags that appear in other tags, such as CDATA sections. And, RSS and Atom make heavy use of CDATA sections to embed HTML code.

With disable-output-escaping, you should be able to lift the HTML tags from the feed and insert them right into the HTML page...but for Firefox. Firefox essentially ignores the instruction so it ends up displaying the raw HTML code.

There's been some debate in the Firefox community as to whether this behavior was standard compliant or not. Nevertheless it is a problem and one for which you need a solution."

Working XML: Serve friendlier RSS and Atom feeds

Then, to top it all off, I realized that there was no way we would have been able to inject that initial stylesheet declaration into Google's Atom XML anyway. We've no choice but to use either some server side script or some string parsing to help with the transformation. So we don't really need to worry about XML Namespaces or XML element names for that matter. Hmmm . . .

Ah well, I'm sure I'll be pissed about all this on Monday, but for now I've had enough. Hopefully I'll get around to posting some catchup stuff by Monday!
Have a good weekend!

File Under: Technology,

No comments:

Post a Comment