From: gerardob on

I am trying to read an xml using minidom from python library xml.dom

This is the xml file:
---------------------------------
<rm_structure>
<resources>
<resource>
AB
<Capacity>100</Capacity>
<NumberVirtualClasses>
2
</NumberVirtualClasses>
</resource>
</resources>
</rm_structure>
----------------------------------
This is the python code:
--------------------------------
from xml.dom import minidom
doc= minidom.parse("example.xml")
resources_section = doc.getElementsByTagName('resources')
list_resources = resources_section[0].getElementsByTagName('resource')

for r in list_resources:
name = r.childNodes[0].nodeValue
print name
print len(name)
---------------------------------
The problem is that the nodeValue stored in the variable 'name' is not "AB"
(what i want) but instead it is a string that has length of 8 and it seems
it include the tabs and/or other things.
How can i get the string "AB" without the other stuff?
Thanks.



--
View this message in context: http://old.nabble.com/string-manipulation.-tp29276755p29276755.html
Sent from the Python - python-list mailing list archive at Nabble.com.

From: Neil Cerutti on
On 2010-07-27, gerardob <gberbeglia(a)gmail.com> wrote:
>
> I am trying to read an xml using minidom from python library xml.dom
>
> This is the xml file:
> ---------------------------------
><rm_structure>
> <resources>
> <resource>
> AB
> <Capacity>100</Capacity>
> <NumberVirtualClasses>
> 2
> </NumberVirtualClasses>
> </resource>
> </resources>
></rm_structure>
> ----------------------------------
> This is the python code:
> --------------------------------
> from xml.dom import minidom
> doc= minidom.parse("example.xml")
> resources_section = doc.getElementsByTagName('resources')
> list_resources = resources_section[0].getElementsByTagName('resource')
>
> for r in list_resources:
> name = r.childNodes[0].nodeValue
> print name
> print len(name)
> ---------------------------------
> The problem is that the nodeValue stored in the variable 'name' is not "AB"
> (what i want) but instead it is a string that has length of 8 and it seems
> it include the tabs and/or other things.
> How can i get the string "AB" without the other stuff?

Check out the strip member function.

name = r.childNodes[0].nodeValue.strip()

--
Neil Cerutti
From: Mark Tolonen on

"gerardob" <gberbeglia(a)gmail.com> wrote in message
news:29276755.post(a)talk.nabble.com...
>
> I am trying to read an xml using minidom from python library xml.dom
>
> This is the xml file:
> ---------------------------------
> <rm_structure>
> <resources>
> <resource>
> AB
> <Capacity>100</Capacity>
> <NumberVirtualClasses>
> 2
> </NumberVirtualClasses>
> </resource>
> </resources>
> </rm_structure>
> ----------------------------------
> This is the python code:
> --------------------------------
> from xml.dom import minidom
> doc= minidom.parse("example.xml")
> resources_section = doc.getElementsByTagName('resources')
> list_resources = resources_section[0].getElementsByTagName('resource')
>
> for r in list_resources:
> name = r.childNodes[0].nodeValue
> print name
> print len(name)
> ---------------------------------
> The problem is that the nodeValue stored in the variable 'name' is not
> "AB"
> (what i want) but instead it is a string that has length of 8 and it seems
> it include the tabs and/or other things.
> How can i get the string "AB" without the other stuff?
> Thanks.

Whitespace in XML is significant. If the file was:

<rm_structure>
<resources>
<resource>AB<Capacity>100</Capacity>
<NumberVirtualClasses>2</NumberVirtualClasses>
</resource>
</resources>
</rm_structure>

You would just read 'AB'. If you don't control the XML file, then:

print name.strip()

will remove leading and trailing whitespace.

-Mark