« August 2005 | Main | October 2005 »

21 September 2005

Ampersandectomy

For a while now, a few of us at www.att.com have been trying to crack the case of getting ampersands to appear properly encoded as & once our Content Management System slices and dices said content and makes beautiful julienned fries. (I mean files.) Try as we might, there always seem to be a few that magically transform into & despite our best efforts. "It's not unlike playing whack-a-mole ... y'see."

So this has got to be a CMS gaffe, right? This isn't rocket science. All we're looking for is well-formed, valid markup at the end of the day. If we enter plain text, we can rest easy knowing & becomes & (or &) at markup generation time. Or, if we enter some XHTML outright (not exactly what I'd call "content" in the purest sense but let's not pick nits - besides that's another topic anyway) the CMS editor kindly performs its own sanity checks before passing it on.

So that's it. We're covered!

Alas, not quite, as we learned on Friday. The culprit, it turns out, is not the CMS per se. Rather, it's the XSLT processor used by the CMS.

"Must be some super-special proprietary closed system! It figures." Wait - actually, no. It's not. Hold on to your hats, it's (drum roll) ... the Apache Software Foundation's very own ... Xalan! (Surprised? Well, I sure was.)

Here's the problem in a nutshell:

Let's suppose I have markup that includes URI attribute values containing two or more query string parameters. Something like <a href="http://my.web.server/file?param1=a&amp;param2=b">...</a>. Notice that I used &amp; to separate each name/value pair. Now I realize I could go the more ecologically friendly route and use the lone semicolon as a separator (which, truth be told, I'd actually prefer). Let's just say for the sake of argument that I need to peacefully coexist with both kinds of separators. Mostly because, for the sake of reality, I actually do need to peacefully coexist with ; and &amp;.

The next ingredient is the XSLT itself, wherein we can set the output method. Should we set it to plain text? Nah, too easy. Let's pick something more sensible, perhaps an output method of xhtml, or even html. Sure! Let's pick html, why don't we. Aside from that, all we do is pass along the markup from point a to point b.

Now, if we were to pass all of this through Xalan, what do you suppose would happen? Should we expect to see those escaped ampersands left as-is in the final output? Presuming we've set up our template properly, I'd say yes we should.

But it is not to be. Instead, Xalan manages to convert each of them to a lone &!

Wait, it gets better. Take that same exact attribute value - the one with the ampersands in it - and assign it to title, or alt, or any other non-URI attribute.

Same result, right? Nope. It emerges unscathed! Really. No fooling.

Bug XALANJ-611 tells all. I won't repeat the discussion here. All I can say is it spells out the situation rather well, plus it is flagged as a "major bug" and is still present in the latest release from what I can tell.

Right. On to plan B. (There's always a plan B.) What about libxslt? Works fine. Saxon? A-OK. XMLSpy? Nooooo problem.

But can I use any of these with my CMS? Sadly, no. The CMS is shipped with Xalan goodness baked right in. In fact, I'll even go so far as to reveal that it has been modified by the vendor. "Ah-ha, so it is the vendor's fault!" Nope. Thanks to Marc Liyanage's TestXSLT, I've verified I can still replicate the unwanted behavior using a fresh install of Xalan, even the latest release. Besides, there's already a bug filed, plus we wouldn't want to break the warranty seal and hack the poor CMS to pieces now, would we?

Ah well. So much for Plan B. [Pauses, looks up to the heavens, shakes both fists in the air, cries out, camera POV from above.] "Ahhhhhhhhh!"

I wonder if there is a chance this can be patched without a lot of muss and fuss? Is it actually low-hanging fruit? How 'bout it, friendly neighborhood Xalan Developers? This one's been around since late October 2001, and is currently reopened. "Can we fix it?"

Meanwhile, there is a (potentially) happy ending here. I conveniently skipped xhtml output mode in the above scenario. What happens if we try it? No problem at all (to Xalan's credit)! Very well then. For html output mode, we know a surprise is in store.

As for www.att.com, it's now clear that our XSLT has been set to html output mode all this time. Whoops. I recommended that we switch to xhtml output mode and kick the tires (and hold the xml declaration for now, thank you). We are, after all, generating what should be well-formed and valid xhtml. Hopefully there are no other gotchas in store by going this route. Right?

Oh, c'mon, you know there will be gotchas.

So far, we've only found one. Turns out we still have a rather unhealthy dose of historical content that uses named anchors in conjunction with fragment IDs, like so: <a name="id"></a>. (Yes, this is considered verboten nowadays. No we're unable to go on a search and replace mission for the time being.) At any rate, because there is essentially no text node (to be displayed), or even if there was only whitespace within, it emerges simplified as <a name="id" />.

Pray tell, what browser do you suppose has a problem applying CSS to this sanely? (Starts with an I ... ends with an R ...)

Now what? We keep that distillation from occurring. For now, I've got the XSLT forcing a space in between when there is neither a text node or nested elements to be found, like so:


<xsl:template match="a[not(normalize-space(.)) and not(*)]">
  <xsl:copy>
   <xsl:copy-of select="@*"/>
   <xsl:text><![CDATA[ ]]></xsl:text>
  </xsl:copy>
</xsl:template>

We're going to kick the tires on this for a bit, allowing it to be introduced with content updates and, if there are no other surprises in between now and then, apply it across the board.

UPDATE: Yes, there are surprises! How about empty <textarea></textarea> or <script></script> blocks? The latter case is especially common when including page behaviors.

For Plan B, then, we're still looking for the "empty" case as before, only this time we let through the ten XHTML 1.0 Strict elements that are expected to be empty outright. The rest get the one space treatment from before. Here's the XSLT I've submitted for this go-round:


<xsl:template match="*[not(normalize-space(.)) and not(*)]">
  <xsl:choose>
    <xsl:when test="../area|../base|../br|../col|../hr|../img|../input|../link|../meta|../param">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:apply-templates/>
      </xsl:copy>
    </xsl:when>
    <xsl:otherwise>
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:text><![CDATA[ ]]></xsl:text>
      </xsl:copy>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Perhaps this can be simplified/refactored somewhat. In any event, so far we're getting far better results with this revision.

9 September 2005

SimpleComments and SpamLookup

I've recently learned that Adam Kalsey's excellent SimpleComments plugin for Movable Type is not aware of the new "junk status" flag in Movable Type 3.2 (part of the spiffy new SpamLookup suite). It's completely understandishable since the latest SimpleComments was released prior to MT 3.2.

(Thought balloon: "Wait a sec. Didn't I just finally get around to installing SimpleComments after upgrading to MT 3.2? D'oh!")

Once I discovered all of my comment and trackback spam was still appearing in my freshly SimpleCommented comments, I did a bit-o-hacking and came up with what I believe to be a simple and effective plugin mod. Until the next official version of SimpleComments is released, this might do the trick for you. Keep in mind: No warranties expressed or implied, yadda yadda yadda. You know the drill ...

First, you may download the patched version (MIT License applies).

WARNING: The following code snippet is now obsolete, and is (now) provided here for entertainment purposes only. Use the download link above to obtain the correct mod. Again, do not use the following code. Please read the rest of this entry all the way through ... it contains a surprise PLOT TWIST! (Thank you.)

Next, here's the behind-the-scenes tour. After stowing away a safety backup of SimpleComments.pl (located in the MT plugins directory), I added the following:


    # If we're on MT 3.2 or higher, filter out that junk!
    use constant JUNK => -1;
    if (MT->version_number >= 3.2) {
      @comments = grep { $_->junk_status() != JUNK } @comments;
      @pings    = grep { $_->junk_status() != JUNK } @pings;
    }

... and I added it immediately before this line:


    my @allComments = (@comments, @pings);

Then I tested rebuilding a really old post just to make sure all was well. (It was.) Then I rebuilt my site. Bye-bye junk, you ask? YES!

UPDATE: Cameron Bulock weighs in with a different approach! Outstanding. We're comparing notes in the meantime. Comments most welcome.

'NUTHER UPDATE: In the MT libraries for both Comment and TBPing I see a constant. For JUNK. Set to -1. A good sign, no? I've updated this entry (and the revised plugin) to reflect this constant.

THE PLOT TWIST: My hat's off to Cameron! My technique checks the junk_status flag to determine which trackbacks pass through. That's all well and good, but there's a wee li'l flaw in the logic: It's now possible to moderate trackbacks! In other words, I should be checking the "visible" flag (which is now influenced by the junk_status flag). This is what MT 3.2's comment and trackback counters do, and it's also what Cameron did. Thus, the technique of checking for the "visible" flag is actually the more correct one. The only icing on the cake to add is a version check and shift a few lines around. Here are the new changes, shown this time as a diff:


76a81,82
>               $terms{visible} = 1
>                       if MT->version_number >= 3.2 && $moderate;
82a89,91
>                       if MT->version_number >= 3.2 && $moderate;
>         @pings = MT::TBPing->load(\%terms, \%args);
>               $terms{visible} = 1
85d93
<         @pings = MT::TBPing->load({ blog_id => $blog_id }, \%args);

The aforementioned download link points to this improved version. If you're not sure which one you have, your best bet is to download it again for good measure. (In the latest version, both my name and Cameron's appear in the revision comments.)

Just to be absolutely clear, this is not a mod for the PHP port. Just the Perl version. Have fun!

5 September 2005

Hurricanes and Lemonade

I happened across this archived broadcast while catching up on my blogroll this morning:

New Orleans' Hurricane Risk - All Things Considered, September 20, 2002: "When scientists consider the possibility of a major storm hitting the U.S. Gulf Coast, they say the ramifications could be devastating -- especially for the city of New Orleans. If a Category 5 hurricane were to strike Louisiana, tens of thousands of lives could be lost. Hear how state and federal officials are working to prevent that scenario."

And here we are, just over a week after Katrina.

Our next-door neighbor's four-year-old granddaughter did a wonderful thing this past Sunday. At her request, and with help from her family, she set up a lemonade and cookie stand to raise money for hurricane relief. My wife happened to notice the activity outside and informed my five-year-old son, who almost tripped over himself getting his sneakers on so he could go outside and help her.

I watched as the two of them started waving and calling out to the passersby. Sure enough, little by little, cars started to pull over, people got out, picked their tasty treats, and when they asked "how much?" the kids replied "whatever you'd like."

They raised nearly $500 in three hours.

100% of it is going to the Red Cross via a matching donation program, so that makes for an almost $1,000 donation. Wow.

One of the passersby was from the Somerset County chapter of the American Red Cross. She explained how she was on her way to train volunteers who were about to leave for two weeks in Louisiana.

I remember volunteering to help after 9/11. Three times. The second time was a twelve-hour overnight shift managing a canteen station for rescue workers, doctors, clergy, police, firefighters, you name it. This particular canteen was extra-busy, as it was adjacent to an outdoor morgue, complete with private bereavement tents, arranged along East 30th Street, just off 1st Avenue.

I also recall the staggering, knee-weakening amount of donations we sorted through and organized at the U.S. Navy dry docks in Bayonne a few weeks later. That was a nine hour shift, and far more physically demanding by comparison.

As intense as these experiences were for me, I can't begin to imagine what those volunteers are going to encounter. Two weeks!

Perhaps our Red Cross visitor told them all about the lemonade stand. Would be a mighty nice thought to take with them into Louisiana.

UPDATE: I was wrong. It wasn't almost $400 as I originally reported. It was $500! Also, on Tuesday, El Diario published an article about the lemonade stand.

3 September 2005

CSS Pseudo-Selectors

Recently, I was quite happy to learn about Shaun Inman's contribution to CSS maintenance sanity, Cascading Style Sheets Server-Side Constants. At the same time, I also share Eric Meyer's frustration over the frequent knee-jerk "use a preprocessor" responses to proposals such as (and not limited to) CSS constants.

Case in point: A few weeks ago, I was perusing the www-style mailing list archives and read a post by Emrah Baskaya all about CSS parent pseudo-containers. I highly recommend reading Emrah's post and the entire thread that flows from it.

I too have been wrestling with keeping CSS maintainable, beyond the basiscs like consistent formatting and ease of reading. Sometimes I'll have no choice but to repeat (repeat) myself ad nauseam in the CSS with the same selectors over and over and over again. Not due to poor markup - at least I don't believe that's the case. Rather, if I must have something nested a few levels deep and it must be assigned a class vs. an id, well, "the needs of the many outweigh the needs of the few."

Can we keep it simple in the midst of all this deep nesting? If I had my druthers I'd say we could, and we should. Here's how I'd do it within the scope of CSS alone.

Suppose you have CSS akin to:


   .class .nested-class .another-nested-class dl { ... }
   .class .nested-class .another-nested-class dl.subclass { ... }
   .class .nested-class .another-nested-class dl dt { ... }
   .class .nested-class .another-nested-class dl.subclass dt { ... }
   .class .nested-class .another-nested-class dl dd { ... }
   .class .nested-class .another-nested-class dl.subclass dd { ... }
   .class .nested-class .another-nested-class dl dd.img img { ... }
   .class .nested-class .another-nested-class dl.subclass dd.img img { ... }

Let's also suppose that, for whatever reason, we require this level of nesting. Unfortunately it's very repetitive! It would be great if we could refactor the CSS and eliminate it. Well, how about this?


   .class .nested-class .another-nested-class (
      dl { ... }
      dl.subclass { ... }
      dl dt { ... }
      dl.subclass dt { ... }
      dl dd { ... }
      dl.subclass dd { ... }
      dl dd.img img { ... }
      dl.subclass dd.img img { ... }
   )

Here we've defined a scope for .class .nested-class .another-nested-class. All the CSS within parentheses is then applied within that scope.

Taking this in a slightly different direction, let's only specify selectors within parentheses. Now you have a handy "pseudo-selector" shorthand. We've just defined a constant for selectors!


   .pseudo-class ( .class .nested-class .another-nested-class )

   .pseudo-class dl { ... }
   .pseudo-class dl.subclass { ... }
   .pseudo-class dl dt { ... }
   .pseudo-class dl.subclass dt { ... }
   .pseudo-class dl dd { ... }
   .pseudo-class dl.subclass dd { ... }
   .pseudo-class dl dd.img img { ... }
   .pseudo-class dl.subclass dd.img img { ... }

I admit this last example is a little repetitive, but the overarching idea here is to reduce the bloat and make cleaner (and leaner) CSS that's still self-contained but also easier to manage.

I floated this idea by Emrah a few weeks ago. He was encouraged by it, to be sure, but he also offered a healthy dose of reality, searching for more related proposals within the archives ... and their responses:

Grouping Selectors (nearly the same concept)
An example comment
Macros for assigning bulk properties (sound familiar?)
An example comment

Bottom line: "... if one of the CSS draft editors reject it, the chances are low the proposal be accepted by the [CSS Working Group]."

Sigh.

Of course, now that I know there's CSS SSC, I can always hope for "plan B." In fact, I'd even welcome an Apache server-side module at this point. OK, 'nuff said. Discuss.