Xml shredding performance [Programming SQL]

Prev: Alter table
Next: SELECT SUBSTRING

From: Johnny Persson on 8 Mar 2010 04:37

Hi,

we are having some performance issues regarding xml shredding.

At this point we are extracting data from xmls from nearly 60 different
companies - and therefore 60 different xml structures. The total amount
of xml is about 350MB and we are trying to extract the data as fast as
possible.

Our current system extracts, transforms and loads the data in about five
minutes. We would however like to do this in about one minute to be pleased.

We use the "nodes/cross apply"-technique to shred the xmls into our
internal format.

This is how we shred the data.
------------------------------

1) Load xml into a temporary table (#XmlTable)
2) Set an xml index
3) Query (like below)

INSERT INTO #TransformedData
SELECT
T0.T.value('asasd', 'asdadd')
T1.T.value('asasd', 'asdadd')
FROM
#XmlTable
CROSS APPLY
data.nodes('asd') AS T0(T)
T0.T.nodes('level1') AS T1(T)

DROP #XmlTable

4) Pass the temporary table #TransformedData into the common/shared
transformation procedure

EXEC LookupData

-------------------------------

This is very I/O intensive and it makes the system slow. Are there any
other good ways to parse the xmls in the sql server? Should we perhaps
move the shredding outside the SQL environment into, for instance, a C#
method which bulk loads the data?

Regards,
Johnny

From: Uri Dimant on 8 Mar 2010 05:06

Johny

http://sqlblog.com/blogs/adam_machanic/archive/2010/01/12/t-sql-tuesday-002-is-it-xml-or-not.aspx

http://sqlcat.com/msdnmirror/archive/2010/03/01/performance-tips-of-using-xml-data-in-sql-server.aspx

"Johnny Persson" <a(a)a.a> wrote in message
news:%23Ht3vLqvKHA.1796(a)TK2MSFTNGP02.phx.gbl...
> Hi,
>
> we are having some performance issues regarding xml shredding.
>
> At this point we are extracting data from xmls from nearly 60 different
> companies - and therefore 60 different xml structures. The total amount of
> xml is about 350MB and we are trying to extract the data as fast as
> possible.
>
> Our current system extracts, transforms and loads the data in about five
> minutes. We would however like to do this in about one minute to be
> pleased.
>
> We use the "nodes/cross apply"-technique to shred the xmls into our
> internal format.
>
> This is how we shred the data.
> ------------------------------
>
> 1) Load xml into a temporary table (#XmlTable)
> 2) Set an xml index
> 3) Query (like below)
>
> INSERT INTO #TransformedData
> SELECT
> T0.T.value('asasd', 'asdadd')
> T1.T.value('asasd', 'asdadd')
> FROM
> #XmlTable
> CROSS APPLY
> data.nodes('asd') AS T0(T)
> T0.T.nodes('level1') AS T1(T)
>
> DROP #XmlTable
>
> 4) Pass the temporary table #TransformedData into the common/shared
> transformation procedure
>
> EXEC LookupData
>
> -------------------------------
>
> This is very I/O intensive and it makes the system slow. Are there any
> other good ways to parse the xmls in the sql server? Should we perhaps
> move the shredding outside the SQL environment into, for instance, a C#
> method which bulk loads the data?
>
> Regards,
> Johnny

From: Stefan Hoffmann on 8 Mar 2010 05:08

hi Johnny,

On 08.03.2010 10:37, Johnny Persson wrote:
> This is very I/O intensive and it makes the system slow. Are there any
> other good ways to parse the xmls in the sql server? Should we perhaps
> move the shredding outside the SQL environment into, for instance, a C#
> method which bulk loads the data?
You may take a look at 'Performing Bulk Load of XML Data (SQLXML 4.0)':

http://technet.microsoft.com/en-us/library/ms171993.aspx

Depending on your files, I would at least set some simple C# samples to
compare performance. You may consider implementing it as CLR stored
procedure if it performs better.

http://technet.microsoft.com/en-us/library/ms131094%28SQL.90%29.aspx

mfG
--> stefan <--

From: Erland Sommarskog on 8 Mar 2010 06:34

Johnny Persson (a(a)a.a) writes:
> 1) Load xml into a temporary table (#XmlTable)
> 2) Set an xml index
> 3) Query (like below)
>
> INSERT INTO #TransformedData
> SELECT
> T0.T.value('asasd', 'asdadd')
> T1.T.value('asasd', 'asdadd')
> FROM
> #XmlTable
> CROSS APPLY
> data.nodes('asd') AS T0(T)
> T0.T.nodes('level1') AS T1(T)

Assuming these are top-level nodes, change this to

data.nodes('/asd') AS T0(T)
T0.T.nodes('/level1') AS T1(T)

There can be huge performance gain by using correct node addressing.
(Unfortunately, the syntax for the best addressing form can be quite
convuluted in some cases.)

--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

From: Johnny Persson on 8 Mar 2010 09:03

Thank you for your answer,

we use XML as column data type so the first article does not seem to
affect us. It is however interesting how the query optimizer work - or
not work :)

The second article had a really interesting part about typed/untyped
xmls. We have had no xml schema for any xml so I tried that..

I created an xml schema for an xml file (~5MB) and compared the
performance between a stored procedure which use the xml schema and a
stored procedure which does not.

The result is to me a bit strange. When we use a "typed xml column" the
execution time is ~15s and the subtree cost is 428. When we use the
normal, untyped xml column, the execution time is the same BUT the
subtree cost is however 866!

Do you have any thoughts about the result and why the execution time
isn't affected?

Regards,
Johnny

On 2010-03-08 11:06, Uri Dimant wrote:
> Johny
>
> http://sqlblog.com/blogs/adam_machanic/archive/2010/01/12/t-sql-tuesday-002-is-it-xml-or-not.aspx
>
>
> http://sqlcat.com/msdnmirror/archive/2010/03/01/performance-tips-of-using-xml-data-in-sql-server.aspx
>
>
>
> "Johnny Persson"<a(a)a.a> wrote in message
> news:%23Ht3vLqvKHA.1796(a)TK2MSFTNGP02.phx.gbl...
>> Hi,
>>
>> we are having some performance issues regarding xml shredding.
>>
>> At this point we are extracting data from xmls from nearly 60 different
>> companies - and therefore 60 different xml structures. The total amount of
>> xml is about 350MB and we are trying to extract the data as fast as
>> possible.
>>
>> Our current system extracts, transforms and loads the data in about five
>> minutes. We would however like to do this in about one minute to be
>> pleased.
>>
>> We use the "nodes/cross apply"-technique to shred the xmls into our
>> internal format.
>>
>> This is how we shred the data.
>> ------------------------------
>>
>> 1) Load xml into a temporary table (#XmlTable)
>> 2) Set an xml index
>> 3) Query (like below)
>>
>> INSERT INTO #TransformedData
>> SELECT
>> T0.T.value('asasd', 'asdadd')
>> T1.T.value('asasd', 'asdadd')
>> FROM
>> #XmlTable
>> CROSS APPLY
>> data.nodes('asd') AS T0(T)
>> T0.T.nodes('level1') AS T1(T)
>>
>> DROP #XmlTable
>>
>> 4) Pass the temporary table #TransformedData into the common/shared
>> transformation procedure
>>
>> EXEC LookupData
>>
>> -------------------------------
>>
>> This is very I/O intensive and it makes the system slow. Are there any
>> other good ways to parse the xmls in the sql server? Should we perhaps
>> move the shredding outside the SQL environment into, for instance, a C#
>> method which bulk loads the data?
>>
>> Regards,
>> Johnny
>
>

| Next | Last
Pages: 1 2 3
Prev: Alter table
Next: SELECT SUBSTRING