From: dmtr on 30 Apr 2010 19:10 Here's a link to the patch exposing this parameter: http://bugs.python.org/issue8583
From: Stefan Behnel on 1 May 2010 02:27 dmtr, 30.04.2010 23:59: >> I think that's your main mistake: don't remove them. Instead, use the fully >> qualified names when comparing. > > Yes. That's what I'm forced to do. Pre-calculating tags like tagChild > = "{%s}child" % uri and using them instead of "child". Exactly. Keeps you from introducing typos in your code. And keeps you from having to deal with namespace-prefix mappings. Big features. > As a result the > code looks ugly and there is extra overhead concatenating/comparing > these repeating and redundant prefixes. The overhead is really small, though. In many cases, a pointer comparison will do. > I don't understand why > cElementTree forces users to do that. So far I couldn't find any way > around that without rebuilding cElementTree from source. Then don't do it. > Apparently somebody hard-coded the namespace_separator parameter in > the cElementTree.c (what a dumb thing to do!!!, it should have been a > parameter in the cElementTree.XMLParser() arguments): > =========== > self->parser = EXPAT(ParserCreate_MM)(encoding,&memory_handler, "}"); > =========== > > Simply replacing "}" with NULL gives me desired tags without stinking > URIs. You should try to calm down and embrace this feature. Stefan
From: Carl Banks on 1 May 2010 06:17 On Apr 27, 6:42 pm, dmtr <dchich...(a)gmail.com> wrote: > Is there any way to configure cElementTree to ignore the XML root > namespace? Default cElementTree (Python 2.6.4) appears to add the XML > root namespace URI to _every_ single tag. I know that I can strip > URIs manually, from every tag, but it is a rather idiotic thing to do > (performance wise). Perhaps upgrade to lxml. Not sure if gives you control over namespace expansion but if it doesn't it should at least be faster. For this and some other reasons, I find ElementTree not quite as handy when processing files from another source as when I'm saving and retrieving my own data. Carl Banks
From: Carl Banks on 1 May 2010 06:33 On Apr 29, 10:12 pm, Stefan Behnel <stefan...(a)behnel.de> wrote: > dmtr, 30.04.2010 04:57: > > > > > I'm referring to xmlns/URI prefixes. Here's a code example: > > from xml.etree.cElementTree import iterparse > > from cStringIO import StringIO > > xml = """<root xmlns="http://www.very_long_url.com"><child/></ > > root>""" > > for event, elem in iterparse(StringIO(xml)): print event, elem > > > The output is: > > end<Element '{http://www.very_long_url.com}child' at 0xb7ddfa58> > > end<Element '{http://www.very_long_url.com}root' at 0xb7ddfa40> > > > I don't want these "{http://www.very_long_url.com}" in front of my > > tags. > > > They create performance disaster on large files > > I seriously doubt that they do. I don't know what kind of XML files you deal with, but for me a large XML file is gigabyte-sized (obviously I don't use Element Tree for those). Even for files tens-of-megabyte files string ops to expand tags with namespaces is going to be a pretty decent penalty--remember ElementTree does nothing lazily. > > (first cElementTree > > adds them, then I have to remove them in python). > > I think that's your main mistake: don't remove them. Instead, use the fully > qualified names when comparing. Unless you have multiple namespaces or are working with defined schema or something, it's useless boilerplate. It'd be a nice feature if ElementTree could let users optionally ignore a namespace, unfortunately it doesn't have it. Carl Banks
From: Stefan Behnel on 1 May 2010 08:34 Carl Banks, 01.05.2010 12:33: > On Apr 29, 10:12 pm, Stefan Behnel wrote: >> dmtr, 30.04.2010 04:57: >>> I don't want these "{http://www.very_long_url.com}" in front of my >>> tags. They create performance disaster on large files >> >> I seriously doubt that they do. > > I don't know what kind of XML files you deal with, but for me a large > XML file is gigabyte-sized (obviously I don't use Element Tree for > those). Why not? I used cElementTree for files of that size (1-1.5GB unpacked) a couple of times, and it was never a problem. > Even for files tens-of-megabyte files string ops to expand tags with > namespaces is going to be a pretty decent penalty--remember > ElementTree does nothing lazily. So? Did you run a profiler on it to know that there is a penalty due to the string concatenation? cElementTree's parser (expat) and its tree builder are blazingly fast, especially the iterparse() implementation. http://codespeak.net/lxml/performance.html#parsing-and-serialising http://codespeak.net/lxml/performance.html#a-longer-example http://effbot.org/zone/celementtree.htm#benchmarks >>> (first cElementTree adds them, then I have to remove them in python). >> >> I think that's your main mistake: don't remove them. Instead, use the fully >> qualified names when comparing. > > Unless you have multiple namespaces or are working with defined schema > or something, it's useless boilerplate. > > It'd be a nice feature if ElementTree could let users optionally > ignore a namespace, unfortunately it doesn't have it. I agree that that would make for a nice parser option, e.g. when dealing with HTML and XHTML in the same code. Stefan
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: PyCon Australia CFP: One Day Left! Next: assigning multi-line strings to variables |