Tuesday, 31 January 2012

Spot the defect - XML string manipulation

So I have been a bit lazy about posting recently, time to look at a defect found in some production code:

            String details = XPathReader.getValueAt(detailPath, xml);
            if (details != null && details.length() > 0) {
                String mappedDetails = getFormattedDetails(details, accId, pay);
                logger.info("Details [" + details + "] replaced with [" + mappedDetails + "]");
                xml = xml.replace("Details=\"" + details + "\"", "Details=\"" + mappedDetails + "\"");
            return xml;

At first glance there is nothing particularly wrong with this code, the XPathReader class extracts a value from a specific path in the XML, some mapping is done on the value and the original value us replaced with the updated details. Unfortunately there is a sneaky defect in this code. The log message printed for the problematic input is:

INFO  [MappingSessionBean] Details [Book##VALUE&DEV] replaced with [Book VALUE 111111111]

Take a moment to see if you are able to spot the problem... Not yet apparent? The input XML (with some details replaced):

<RequestXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
      <Header Sending_System="SOURCE" Message_Identifier="Update"
            Message_Status="New" Message_Name="Nostro Update" />
      <AccountingInfo BICCode="XXXXXXXX" CurrencyCode="USD"
            TradingArea="Book" Amount="12345.67/NEW/11111111/444444.44"
            ValueDate="2011-11-11+0200" Details="Book##VALUE&amp;DEV"
            TransactionReference="111111111" AccountType="PRINCIPAL" />

Still not clear? Notice the difference between the input XML Details field and the log message?

Details [Book##VALUE&DEV]

The Details value in the XML has the ampersand escaped as "&amp;" while the call to XPathReader.getValueAt() returns the attribute value without the escaping. The getFormattedDetails method merrily goes about its job enriching the details string to be "Book VALUE 111111111" but then the code fails when trying to do the replace call:

        xml = xml.replace("Details=\"" + details + "\"", "Details=\"" + mappedDetails + "\"");

The contents of the details variable are now "Book##VALUE&DEV" instead of the original "Book##VALUE&amp;DEV" so no replacement is performed. Given that this code is due to be replaced shortly the quick and easy fix is to use the Apache commons-lang StringEscapeUtils.escapeXml() method. Thus the replace call becomes:

        xml = xml.replace("Details=\"" + StringEscapeUtils.escapeXml(details) + "\"", "Details=\""
                + StringEscapeUtils.escapeXml(mappedDetails) + "\"");

The alternative (and more correct solution) would be to convert the XML to an object using your favourite XML library and then perform the data manipulation on the object before converting back to XML.