wxXmlDocument [wxWindows]

Prev: Thumbnail control and wxImageList limits
Next: wxXmlDocument memory leak

From: "Stephan Rose" on 16 Feb 2007 21:28

Question regardin the wxXmlDocument.

I was trying it out earlier today and it essentially works quite well.
Loaded the XML file 100% correctly, no problems there.

The only thing was$B!D(Bit takes several minutes to load the file which is about
25 megs in size.

While I know that processing the data in entire file will easily take me
several minutes, having the wxXmlDocument load take that long is somewhat
problematic as the call blocks any ability to show any kind of progress.

Is there any way to get the class to load the data on the fly as I iterate
through the child nodes?

Thanks,

Stephan

From: Yuri Borsky on 17 Feb 2007 03:41

"Stephan Rose" <kermos(a)somrek.net> ��
��:000f01c7523b$4753f0a0$6402a8c0(a)stephan...
>
> The only thing was$B!D(Bit takes several minutes to load the file which
is about
> 25 megs in size.
>

You may want to look at TinyXML - fast and simple XML parser. While it lacks
many useful features,
like checking for DTD correctness it is actually much faster.

> Is there any way to get the class to load the data on the fly as I iterate
> through the child nodes?

Is it actually possible with XML? I mean this is exactly one of the big
problems of XML: in order to process any node you must start from top-level
one and you won't have it until you have entire XML tree loaded and verified
according to DTD or schema.

Sometimes it is actually much faster to process an XML like a normal text.
Say if your task is to calculate number of given nodes to get some
statistics. There you may just look for corresp. string while processing XML
text line by line. But most of the time yes, first you load _and_verify_
entire XML tree and only then you get to process it.

>
>
>
> Thanks,
>
>
>
> Stephan
>
>

From: Francesco Montorsi on 17 Feb 2007 07:30

Stephan Rose ha scritto:
> Question regardin the wxXmlDocument.
>
>
>
> I was trying it out earlier today and it essentially works quite well.
> Loaded the XML file 100% correctly, no problems there.
>
>
>
> The only thing was$B!D(Bit takes several minutes to load the file which is
> about 25 megs in size.
>
>
>
> While I know that processing the data in entire file will easily take me
> several minutes, having the wxXmlDocument load take that long is
> somewhat problematic as the call blocks any ability to show any kind of
> progress.
you should load the document from a secondary thread to avoid blocking
your GUI. You could then use wxGauge::Pulse to show the progress of the
loading.

>
>
>
> Is there any way to get the class to load the data on the fly as I
> iterate through the child nodes?
you can take a look at libxml2 (http://xmlsoft.org) - IIRC it does
support that feature and wxXml2 component at wxCode wraps it for
wxWidgets (even if it does not wrap the load-on-fly feature).

HTH,
Francesco

---------------------------------------------------------------------
To unsubscribe, e-mail: wx-users-unsubscribe(a)lists.wxwidgets.org
For additional commands, e-mail: wx-users-help(a)lists.wxwidgets.org

From: John Ralls on 17 Feb 2007 12:15

On Feb 17, 2007, at 12:41 AM, Yuri Borsky wrote:

>
>
> "Stephan Rose" <kermos(a)somrek.net> ÐÉÛÅÔ ×
> ÓÏÏÂÝÅÎÉÉ:000f01c7523b$4753f0a0$6402a8c0(a)stephan...
>
>> Is there any way to get the class to load the data on the fly as I
>> iterate
>> through the child nodes?
>
> Is it actually possible with XML? I mean this is exactly one of the
> big
> problems of XML: in order to process any node you must start from
> top-level
> one and you won't have it until you have entire XML tree loaded and
> verified
> according to DTD or schema.
>
> Sometimes it is actually much faster to process an XML like a
> normal text.
> Say if your task is to calculate number of given nodes to get some
> statistics. There you may just look for corresp. string while
> processing XML
> text line by line. But most of the time yes, first you load
> _and_verify_
> entire XML tree and only then you get to process it.

There are two flavors of XML parsers available: Tree-based as you
describe here, generally are based on the Document Object Model, or
DOM. There are also event-based parsers, often, though not always,
based on SAX; expat (which is used in wxWidgets internally and is
part of the distribution) is an event based parser which isn't based
on SAX. Some of the larger XML support libraries like Xerces and
libxml2 provide both.

Event based parsers notify the application (often via callbacks) of
the beginning, value, and ending of each node as it occurs. It is the
application's job to keep track of where it is in the document's
tree. One oft-touted benefit is that they do not require the entire
document to be in memory, so very large documents may be handled by
an event-based parser where a tree-based parser would choke or swap
itself to a standstill.

Unless you have tight control over the format of the incoming
documents, processing XML as plain text is a bad idea. The DTD or
schema may insert additional content via entities and default
attributes that plain text processing will not handle correctly.

Regards,
John Ralls

---------------------------------------------------------------------
To unsubscribe, e-mail: wx-users-unsubscribe(a)lists.wxwidgets.org
For additional commands, e-mail: wx-users-help(a)lists.wxwidgets.org

From: Yuri Borsky on 17 Feb 2007 14:43

John Ralls <jralls(a)ceridwen.fremont.ca.us> ��
��:70B0189C-FEBD-4556-991E-6EF68DF18CB6(a)ceridwen.fremont.ca.us...

> There are two flavors of XML parsers available: Tree-based as you
> describe here, generally are based on the Document Object Model, or
> DOM. There are also event-based parsers, often, though not always,
> based on SAX; expat (which is used in wxWidgets internally and is
> part of the distribution) is an event based parser which isn't based
> on SAX. Some of the larger XML support libraries like Xerces and
> libxml2 provide both.

Thanks for the insight. Now I wonder why wxWidgets use event-based parser
internally but expose it tree-like :)

> Unless you have tight control over the format of the incoming
> documents, processing XML as plain text is a bad idea. The DTD or
> schema may insert additional content via entities and default
> attributes that plain text processing will not handle correctly.

That is true. However sometimes speed gain you get from raw text processing
makes it worth it - and yes, only when appropriate and only when you know
what you are doing:)

|
Pages: 1
Prev: Thumbnail control and wxImageList limits
Next: wxXmlDocument memory leak