Improving Javascript XML Node Finding Performance by 2000%

In my work, I’m parsing web services all of the time. Most of the time, they’re XML, which does not make the best use of bandwidth/CPU time (compared to JSON), however, if it’s all that you’re given then you can certainly get by. I’ve been looking into ways to speed up the XML document traversal in with jQuery after the current best practice method was removed.

The basic way to find certain nodes in an XML web service is to use the .find() method. This is used heavily by the SPServices jQuery helper (which is, in general, a great library).

$(xData.responseXML).find("[nodeName='z:row']").each(function() {
// Do stuff
});

That’s absolutely fine – it’s going to find the attribute nodeName with a value of z:row. However, since jQuery 1.7, this method does not work. I raised this regression in the jQuery bug tracker and was encouraged to find a solution; another selector that worked in all browsers. Unfortunately, at the time I couldn’t come up with anything better than this:

$(xData.responseXML).find("z\\:row, row").each(function() {
// Do stuff
});

The “z\\:row” selector works in IE and Firefox, and the “row” selector works in Chrome and Safari (I’m unable to test in Opera here, sorry). This was flagged as the solution to the problem and they wouldn’t be making any fixes to the jQuery core.

After a few weeks of using this method, I noticed that the site had been slowing down, especially in IE, and I thought this new selector was the cause. So, I looked into the performance numbers using jsPerf and I raised a bug too. My first test was to see what the current solution was doing, and whether jQuery 1.7 had made things worse.
Test case: http://jsperf.com/node-vs-double-select/4

So, performance in Chrome is identical for each of the selectors (and it’s the same in Firefox and Safari) but IE drops nearly half of its operations because it has to perform that second selector.

It’s still not very high performance though, and so I looked for other solutions.

Dmethvin suggested:

Did you try the custom plugin in the ticket? If you’re having performance issues that should be much faster.

The plugin he’s referring to is this:

jQuery.fn.filterNode = function(name){
   return this.filter(function(){
      return this.nodeName === name;
   });
});

This filters content by their nodeName and compares it against the name that you gave it. The issue with this is that .filter() does not traverse down the tree, staying at the level of the set of objects that it was given. Therefore, a quick solution was this:

$(xData.responseXML).children().children().children().children().children().children().children().filterNode('z:row').each(function() {
// Do stuff
});

jsPerf Test: http://jsperf.com/node-vs-double-select/1

Wow, that’s about 50 times faster. Even IE beats Chrome when doing this operation. The simple reason is that it’s got a smaller set of objects to go through and it’s comparing a single attribute rather than parsing the text of the XML to try and find the namespaced element.

Still, I wasn’t satisfied as in order to achieve that performance, I had to know how deep I was going to be going in order to retrieve the set. So, back to the bug and another suggestion by dmethvin:

If you’re going that deep, use a filter function passed to .find(). How does that fare?

After a few attempts, a colleague of mine came up with this beauty:

$.fn.filterNode = function(name) {
      return this.find('*').filter(function() {
        return this.nodeName === name;
      });
    };

jsPerf test: http://jsperf.com/node-vs-double-select/3

Incredible performance increase using .find('*').filterNode('z:row')
http://jsperf.com/node-vs-double-select/3

Using .find(‘*’).filter() increased performance to 200x faster than the original .find(‘z:row’) selector

I mean, wow, that’s incredible. On the graph, those tiny little bits of colour are the original selectors, and those only 20% of the way up are the previous massive performance increase by using filter. It should also be noted that IE8 performance using this selector increased in jQuery 1.7 in comparison to when using jQuery 1.6.

Side-note: IE10’s javascript performance is almost equal to that of Google Chrome. In comparison, IE9 (not shown) is about half of that.

The reason for this massive increase is that it’s backed by native selectors. A .find(‘*’) will translate into element.querySelectorAll(‘*’) which is very fast when compared to doing 8 .children() calls.

Summary
Dealing with large amounts of data from web services needs to be fast. Using a simple .find() on the node name no-longer works and alternatives have been investigated. The fastest method, using a short one-line plug-in, improves performance by up to 2000% compared to the old methodology.

I’ll be notifying the SPServices group of this post, and hopefully they can improve the performance of their library.

Steve Workman

Steve Workman is the UI Tech Lead for yell.com at hibu. He is also an organiser for London Web Standards is an occasional public speaker, talking about web performance and web standards

More Posts - Website - Twitter

Tags: , , , , ,

26 Comments

  1. Tom Daly said:

    Great find, got stumped today building something w/ spservices and jquery 1.7. I was scratching my head till i saw your discussion. Thanks for posting!

  2. Marc Anderson said:

    Steve:

    The “SPServices group” (me) is impressed with your work on this. I’m going to look at getting this into the next release, if only for the speed increase.

    The harder question may be what we can tell all the folks out there who are using the .find(“[nodeName='z:row']“) method. Obviously, switching all their code to the .find(‘*’).filter() method will be preferable if they want to improve performance. but there’s a lot of code out there that people won’t want to *have* to change just because 1.7 is going to break it. Any thoughts on how I can help in SPServices by converting their .find(“[nodeName='z:row']“) calls to .find(‘*’).filter() calls? Obviously, I can’t help with anything outside an SPServices call, but maybe as a stop-gap I could translate their calls for them, and improve their performance to boot.

    I’ll post this back on the thread on the SPServices Discussions as well.

    BTW, when you say “(which is, in general, a great library)”, I’m all ears if you can suggest improvements. Obviously you’ve come up with a very important one here.

    Thanks,
    M.

  3. Steve Workman said:

    Hi Marc, thanks for the kind words.

    Overriding calls like that will be quite tricky. In jQuery, the .find() method is part of the Sizzle selector library – you’d have to find the method for that, look at the selector to try and match “[nodeName='z:row']” and execute .find(‘*’).filter() instead, returning that result. However, I don’t know what else you might break because of that, so I’d be very wary of doing it.

    Also, whilst it’s a 200x performance increase in this one function, for small lists it’s only going to take off 5-6ms of processing time. For larger lists it’ll make more difference, but proportionally the time taken to download the extra data will still make the request feel slow. Developers will get far more performance out of enabling gzip and caching AJAX request results than by using this function.

    As far as upgrade paths go, I’d do what you’ve done before and split the library on the new jQuery version, making 0.7 a 1.7+ only library. This then lets you take advantage of the .on()/.off() delegate shorthand and any other 1.7+ improvements that will come around.

    I’d also recommend that everyone reads this slide set from Addy Osmani (jQuery team member) on javascript performance – it’s what set me off on this path to increasing performance, that and a web page that took 15 seconds to load (now down to 3, limited by network performance).

  4. Paul T said:

    Thank you. Great solution and awsome to see that SPServices has already embraced this into a method in the library.

  5. Christophe said:

    The title of the article is misleading. You are not improving JavaScript performance, you are merely removing useless abstraction layers to get closer to plain JavaScript.

    Have you tried using getElementsByTagName directly?

  6. Steve Workman said:

    Hi Christophe,

    Yes, if you put it like that the title is misleading, but a title of “Ditch jQuery, use native calls and write a lot more code yourself” is a different topic. You’ll also have to deal with a lot more browser inconsistencies so I’m willing to call jQuery “javascript” for now.

    No, I’ve not tried the getElementsByTagName accessor. Perhaps you could add a test case to the jsPerf.com test and let us all know how it goes.

  7. Christophe said:

    “Ditch jQuery” doesn’t make much sense either.

    How about just using the best option for the job at hand? Why would you discard native JavaScript methods as soon as you start using jQuery?

    I am not familiar with jsPerf.com, I’ll need to take a closer look. You are correct about browser inconsistencies, and you might have to write something like this:
    getElementsByTagName(“z:row”)||getElementsByTagName(“row”)

  8. tester said:

    I just wanna note that I struggled with this writing some jQuery for a sharepoint site. What I found working with jQ 1.7.1 in IE, Firefox and Chrome browsers was this syntax: .find(‘”z\\:row”‘) in other words, double quotation marks inside single quotation marks, with escaped colon. Haven’t tested for speed though.

  9. Steve Workman said:

    Sounds interesting. Add it to the jsperf tests and see what you get. The links are still open for editing so see what you can do.

  10. tester said:

    Testing with your code, I now realise that my method actually returns all nodes (works the same as .find(‘*’)). Should’ve tested more thoroughly before posting.

  11. Robert said:

    Very interesting article. Thanks for doing the work and sharing with the community.

  12. TK said:

    Hey Steve,

    Any ideas on how to improve performance when I’m trying to .find() via attributes? Let me give you an example.

    So I was using $(XMLData).find(‘row[id=3317][orderindex=1]‘).attr(‘value’) to find the particular row above. As you might imagine, this is fine for a couple rows, but when you get to around 300+ it grinds to a halt.

    Any ideas on how to make the find faster? I tried your method above but finding the nodes itself by name isn’t the problem, it’s finding the right node by name with the right attribute.

  13. TK said:

    Sorry the last post ate my XML.

    [?xml version ='1.0' encoding='utf-8'?][result]
    [row orderindex="1" id="3317" nthvalue="162"/]
    [row orderindex="1" id="5000" nthvalue="150"/]
    [/result]

  14. Steve Workman said:

    Hi TK,

    I’ve had a quick play with your problem. My first though was to be able to create a DOM tree and parse using getElementById, but this isn’t possible (StackOverflow reference).

    In short, jQuery.find is slow, really slow. Using filterNode to find the top-level nodes and then looping over that to find the right properties is ~32% faster. I’ve made a JSPerf test for this case, please have a look and add some more test cases so you can compare and contrast. It may also help to have a larger XML document to go over so you can see the effects of a large dataset.

    Good luck!

    Steve

  15. Robert Kendall said:

    Your solution works well in Firefox and for small documents in Chrome, but when I try to use it on large XML documents in Chrome (over 4MB) I get an “Uncaught RangeError: Maximum call stack size exceeded”.

  16. Dilan said:

    This doesn’t work for Safari iOS4.3.3 – iPhone browser but worked in Safari Default (Automatically Chosen) browser. Any idea to fix this issue for Mobile browsers ???
    My tested code is,
    function getList(){
    $().SPServices({
    operation:”GetListCollection”,
    webURL: $(this).attr(“WebFullUrl”),
    async:false,
    completefunc: function(xData, Status) {
    $(xData.responseXML).find(“List”).each(function(){
    listTitle = $(this).attr(“Title”);
    listItem = “” + listTitle + “”
    $(“select.list-selector”).append(listItem);
    });
    }
    });
    }

  17. Jasper Valero said:

    Excellent work! Just ran into the issue with namespaces working with MRSS. This worked like a charm!

Leave a response