atxgeek: Revisited: JSON bigger than XML???

A little over a year ago I stumbled across a Google code sample comparing JSON and XML that caught my eye for a reason completely unrelated to Google's intended subject: the sample JSON-encoded data was larger than its XML equivalent! This represented an unlikely exception to the generally-accepted view that JSON is the leaner data format. Still, I was recently wondering if others have made any serious efforts to point out the fact that XML can be leaner than JSON...

Background: JSON bigger than XML???
To be fair, the XML data sample from Google I stumbled across and wrote about a year ago was not significantly smaller than its JSON equivalent -- only about 5% smaller. In this case I consider a 5% difference a wash. What made it notable is that the accepted rule of thumb seems to be that JSON always yields a smaller footprint than XML. The idea that JSON is consistently and significantly leaner than XML is so core to the thinking surrounding JSON's use that the header text on the XML page of the JSON.ORG website is "JSON: The Fat-Free Alternative to XML".

I was not surprised to run across an example showing it is possible for certain instances of XML to be the same size or smaller than their JSON equivalents. What impressed me was the fact that the Google samples made no mention of the sizes of the competing data sets. This was not some contrived example put together to show that an XML data set can be smaller than JSON. Further, the example utilized a classic "fat" XML implementation with each piece of data defined by opening and closing tags rather than shoving data into tag attributes in order to avoid the added "tag tax". In other words, there was opportunity for the example XML to have an even smaller footprint than it already did.

Size Does Not Equal Performance
In my casual online reading I ran across numerous performance tests performed by various bloggers comparing XML and JSON. In many cases JSON handily beat XML. In others the reverse was true. Size appeared to matter less than the structures of tested data and the encoders / parsers used.

Consider the nice write up by Punit Ganshani back in November, 2011. Punit briefly compared various .NET serializers by providing implementation code, sample rendered data and tests results of serialization performance for each serialization option. Of special note were the comparisons of XmlSerializer, DataContractSerializer and DataContractJsonSerializer.

Punit serialized an array of two objects with each object containing two attributes (a total of four data elements). Processing by XmlSerializer yielded an XML data set of 321 characters. The DataContractSerializer yielded a data set of 295 characters. The DataContractJsonSerializer yielded a data set of only 79 characters! Interestingly, though, the 79-character JSON data set required 57% more time to produce than the 295-character XML set produced by the DataContractSerializer.

Serializer	Data Set Size	Serialization Time (200K records)
XmlSerializer :	321 characters	1142 mSec
DataContractSerializer :	295 characters	539 mSec
DataContractJsonSerializer :	79 characters	847 mSec

While we would expect a longer serialization time to be made up for in a shorter communication time, we still need to consider de-serialization time. JSON usually has a clear advantage with regards to moving serialized data sets between points but that advantage can clearly be impacted depending upon the performance of the encoder/decoder involved.

The Relevant Factors Are Not Always in Your Control
Developers are often forced to deal with someone else's data sources (encoders and data structures) or face the need to support multiple types and implementations of data targets (parsers). Such was the case in Eddie Webb's 2010 experiment when comparing JSON and XML data feeds from Netflix. Eddie noted that data sets with small pieces of data favor JSON due to the added cost of opening and closing tags in XML. However, as the size of each piece of data grows the negative impact of the enclosing tags is reduced in comparison. This was an excellent observation: the typical JSON "size advantage" depends in part on the typical size of data elements to be serialized. JSON is not a one-size-fits-all (data sets) performance enhancement over XML.

The end result of Eddie's tests were that he stuck with XML as the better-performing data set. The combination of the sizes of tested data sets, data structures and decoder performance all played a part in his declaring XML the winner. Notably he employed a SAX parser for XML (great for processing data as it is being loaded/read) rather than a DOM parser (great for skipping around an already-loaded data set). Most web browsers have built-in DOM parsers so Eddie's SAX-parser-scenario was not typical of the way your common web-browsing end user would most likely consume XML.

Summary
The XML vs. JSON debate won't end anytime soon. As long as developers are forced to choose between the two, here are a few thoughts to keep in mind:

(1) JSON is typically smaller than XML... but not always.
(2) Smaller data sets are transferred more quickly but transfer speed isn't the whole story.
(3) Data sets comprised of larger data elements can nullify the JSON "size advantage'.

Or, more succinctly put: Don't assume that JSON is going to perform better than XML.

atxgeek

July 31, 2012

Revisited: JSON bigger than XML???

No comments:

Post a Comment