From: dmtr on
Is there any way to configure cElementTree to ignore the XML root
namespace? Default cElementTree (Python 2.6.4) appears to add the XML
root namespace URI to _every_ single tag. I know that I can strip
URIs manually, from every tag, but it is a rather idiotic thing to do
(performance wise).
From: Stefan Behnel on
dmtr, 28.04.2010 03:42:
> Is there any way to configure cElementTree to ignore the XML root
> namespace? Default cElementTree (Python 2.6.4) appears to add the XML
> root namespace URI to _every_ single tag.

Certainly not in the serialised XML. Are you referring to the qualified
names it uses?

Stefan

From: dmtr on
I'm referring to xmlns/URI prefixes. Here's a code example:
from xml.etree.cElementTree import iterparse
from cStringIO import StringIO
xml = """<root xmlns="http://www.very_long_url.com"><child/></
root>"""
for event, elem in iterparse(StringIO(xml)): print event, elem

The output is:
end <Element '{http://www.very_long_url.com}child' at 0xb7ddfa58>
end <Element '{http://www.very_long_url.com}root' at 0xb7ddfa40>


I don't want these "{http://www.very_long_url.com}" in front of my
tags.

They create performance disaster on large files (first cElementTree
adds them, then I have to remove them in python). Is there any way to
tell cElementTree not to mess with my tags? I need that in the
standard python distribution, not my custom cElementTree build...
From: Stefan Behnel on
dmtr, 30.04.2010 04:57:
> I'm referring to xmlns/URI prefixes. Here's a code example:
> from xml.etree.cElementTree import iterparse
> from cStringIO import StringIO
> xml = """<root xmlns="http://www.very_long_url.com"><child/></
> root>"""
> for event, elem in iterparse(StringIO(xml)): print event, elem
>
> The output is:
> end<Element '{http://www.very_long_url.com}child' at 0xb7ddfa58>
> end<Element '{http://www.very_long_url.com}root' at 0xb7ddfa40>
>
>
> I don't want these "{http://www.very_long_url.com}" in front of my
> tags.
>
> They create performance disaster on large files

I seriously doubt that they do.


> (first cElementTree
> adds them, then I have to remove them in python).

I think that's your main mistake: don't remove them. Instead, use the fully
qualified names when comparing.

Stefan

From: dmtr on
> I think that's your main mistake: don't remove them. Instead, use the fully
> qualified names when comparing.
>
> Stefan

Yes. That's what I'm forced to do. Pre-calculating tags like tagChild
= "{%s}child" % uri and using them instead of "child". As a result the
code looks ugly and there is extra overhead concatenating/comparing
these repeating and redundant prefixes. I don't understand why
cElementTree forces users to do that. So far I couldn't find any way
around that without rebuilding cElementTree from source.

Apparently somebody hard-coded the namespace_separator parameter in
the cElementTree.c (what a dumb thing to do!!!, it should have been a
parameter in the cElementTree.XMLParser() arguments):
===========
self->parser = EXPAT(ParserCreate_MM)(encoding, &memory_handler, "}");
===========

Simply replacing "}" with NULL gives me desired tags without stinking
URIs.