From: Adam Tauno Williams on
On Sun, 2010-05-16 at 02:37 +0200, Martin v. Loewis wrote:
> > ??? The namespaces are embedded in the document. Personally I find it
> > odd I have to tell xpath about the namespace of the document it is a
> > $*&@(*& method of.
> How so? Why do you say it's a "method", and why do you say "of"?
> Usually, xpath expressions are *not* part of the document they operate
> on, but part of the code that performs the operation.

from lxml import etree

doc = etree.parse(data)
doc.xpath(....)

> Consequentially,
> the namespace prefixes in the xpath expression do *not* occur in the
> document (other than by chance), but are defined by whoever writes the
> xpath expression. That is typically somebody different from the one
> writing the document

Maybe true technically, but false in practice. If I receive XML data
from source XYZ or service XYZ the use of namespaces and their prefixes
is extremely consistent [in practice] and very customary (for example:
I've never seen the DSML namespace abbreviated as anything other than
"dsml" and I rarely see WebDAV propfind XML use a namespace prefix other
than "D"). The odds that a customer or vendors ERP will generate
different namespaces and abbreviations between requests is ludicrously
remote [I don't recall ever seeing it happen].

And if the xpath fails to produce normal [or any] output the workflow
with either do nothing or abend which will draw the attention of an
administrator.

> - if you would always write them together, you
> wouldn't need xpath in the first place, but could produce the selection
> result right away.


From: Stefan Behnel on
Martin v. Loewis, 15.05.2010 23:37:
>> BTW, I'm still not sure I understand your problem. Could you provide
>> some more details?
>
> Wouldn't it be easier if you told the OP how to access the prefix
> mappings in lxml etree, or, if this was actually not possible, admitted
> that it is actually not possible?

Well, there's an "nsmap" property on each Element that provides the mapping
of prefixes to namespace URIs that form the scope of the Element. However,
while this is what the OP asked for, it is not what the OP wants, simply
because it doesn't solve the problem. Prefixes can get defined and
redefined arbitrarily often, so there is no such thing as a
prefix-namespace mapping "of the document". Example:

<x:tag xmlns:x="urn:uri1">
<x:tag xmlns:x="urn:uri2">
<x:tag xmlns:x="urn:uri3 />
</x:tag>
</x:tag>

Trying to infer a prefix-namespace mapping from that to push it into an
XPath evaluation is futile.

That's why I asked for more details in order to understand what the actual
problem is that the OP is trying to solve, because the approach that the OP
is apparently trying to follow is clearly misguided.

Stefan

From: Stefan Behnel on
Adam Tauno Williams, 16.05.2010 06:00:
> Given that XML documents can be very large I'd rather avoid a parsing of
> the document [beyond what lxml/etree] has already done] just to retrieve
> the namespaces and their prefixes.

In order to find out which prefixes are used in the document and which set
of namespace URIs each of them is mapped to, you need to traverse the
entire document and aggregate all namespace definitions on all Elements.
However, the result will be mostly useless, as a prefix is only meaningful
within the scope of its definition. It doesn't have any sensible meaning
for the entire document.

Stefan

From: Stefan Behnel on
Adam Tauno Williams, 15.05.2010 23:04:
> On Sat, 2010-05-15 at 22:58 +0200, Stefan Behnel wrote:
>> Adam Tauno Williams, 15.05.2010 22:40:
>>> On Sat, 2010-05-15 at 22:29 +0200, Stefan Behnel wrote:
>>>> Adam Tauno Williams, 15.05.2010 20:37:
>>>>> Say I have an XML document that begins with:
>>>>> <?xml version="1.0" encoding="utf-8"?>
>>>>> <dsml:dsml xmlns:dsml="http://www.dsml.org/DSML">
>>>>> How can one access the namespaces define in this node? I've done a fair
>>>>> amount of XML in Python, but haven't been able to uncover the call to
>>>>> enumerate the namespaces.
>>>>> Primarily I am using etree from lxml.
>>>> What do you need the namespaces for?
>>> One needs to know the defined namespace in order to perform xpath
>>> operations.
>> Well, yes, but unless you already know the namespace (URI), you can't know
>> what the tag you find signifies in the first place.
>> Unless, obviously, you are confusing namespaces with namespace prefixes.
>> But you don't need to know the prefixes for XPath.
>> Does this help?
>> http://codespeak.net/lxml/xpathxslt.html#namespaces-and-prefixes
>
> I know that.

I just remembered that there's also this:

http://codespeak.net/lxml/FAQ.html#how-can-i-find-out-which-namespace-prefixes-are-used-in-a-document

Stefan

From: Martin v. Loewis on
> Well, there's an "nsmap" property on each Element that provides the
> mapping of prefixes to namespace URIs that form the scope of the
> Element. However, while this is what the OP asked for, it is not what
> the OP wants, simply because it doesn't solve the problem.

Well, it solves the problem at hand: he gets some prefix mapping.

He probably could have used a hard-coded prefix mapping for the 20 or so
namespaces in his application instead (with a different set of flaws in
that approach).

> That's why I asked for more details in order to understand what the
> actual problem is that the OP is trying to solve, because the approach
> that the OP is apparently trying to follow is clearly misguided.

I completely agree. However, I recommend that we let him find out on his
own. I suspect he has some idiomatic usage of XML, perhaps with all
namespace prefixes defined in the root element. He'll find out that his
approach is flawed in the general case when he encounters such a case.
It's probably pointless trying to convince him in the abstract.

Regards,
Martin