XML parsing slowly

Posted on

Question :

I’m currently trying to tune a somewhat problematic stored procedure and I’ve noticed that majority of cost associated with running it comes from parsing parsing some XML into a temp table. Here is the SQL in question:

    RemarkText VARCHAR(2500) NOT NULL,
    ListingID INT NOT NULL
    T.c.value('@RemarkTypeID[1]','tinyint') AS RemarkTypeID,
    T.c.value('@RemarkText[1]','varchar(2500)') AS RemarkText,
FROM @RemarksXml.nodes('/Remarks[1]/Remark') AS T(c)

Also, so you have some idea of the structure of the XML being parsed:

    <Remark xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" RemarkID="211767" DateTimeCreated="1991-11-19T10:43:50" DateTimeModified="1992-01-01T10:43:50" RemarkTypeID="8" RemarkText="aaaWasher,Dryer,Refriger,Microwav" />
    <Remark xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" RemarkID="211768" DateTimeCreated="1991-11-19T10:43:50" DateTimeModified="1992-01-01T10:43:50" RemarkTypeID="3" RemarkText="aaaell Maintained 3Bd 1.5Ba Semi-Detached Home In Cambridge Gardens. Garage W As Converted Into A Nice Family Room The Outside Is Maintence Free. Includes All Appliances,Security System,Storage Shed,Ceiling Fans And New Carpet.This Home Is Very Well Pricded. Seller Will Consider Fha Or Va Along With Conventional Finacing. This Is A Great Buy." />

Is there anything I can do to improve the performance of parsing these XML nodes and setting up this temp table?

Answer :

You could try OPENXML. Now I don’t normally recommend OPENXML as it has a number of well-known memory issues (basically it can take 1/8th of your buffer pool depending on the size of XML)(!!TODO add link). However legend has it, it is faster for larger pieces of XML, so it’s worth a try in a dev/test environment and if you know the memory issues, and you get the performance, it’s up to you to decide which you need most. Something like this:


EXEC sp_xml_preparedocument @handle OUTPUT, @RemarksXml

SELECT RemarkTypeID, RemarkText, @ListingID
FROM OPENXML( @handle, '/Remarks[1]/Remark', 1 ) 
    RemarkTypeID    TINYINT,
    RemarkText      VARCHAR(2500)

EXEC sp_xml_removedocument @handle

Remember to always call sp_xml_removedocument. I am guessing your real XML is much larger, can you give us an idea how many Remark elements it will have and what the size is in KB / MB. I will come back to the post later and set up a test rig to compare performance based on your stats.

UPDATE: According to your example script, your XML can only have a max of 256 Remark elements with a max length of 2500. Having created some sample XML to meet these criteria and tested it, there is little in performance between the two techniques, and both finish in sub-second. In this case I would choose the XML datatype and methods. Can you provide the other information requested please?

There are two main things you can do to improve xml parsing performance:

  • Make the xml variable/column typed, which means declaring an xsd schema on it. This will force a validation of the xml, which in and of itself will take a little time, but parsing speed will improve.
  • Index an xml column (does not apply to xml variables). You can index xml columns in a number of different ways, depending on your requirements. This may give you excellent performance gains for more complex queries and lookups in xml blobs.
  • If your query is part of a much larger query, I seem to remember that xml operations don’t parallelize, so you might do wisely to break up a large query and do the xml parsing in a separate query from the rest of the work.

Leave a Reply

Your email address will not be published.