HTML Parser which allows low-keyed local changes? [Python]

Prev: Meet Arab, Russian, American Singles From All Over The World
Next: Why this error message

From: Robert on 31 Jan 2010 14:57

I tried lxml, but after walking and making changes in the element
tree, I'm forced to do a full serialization of the whole document
(etree.tostring(tree)) - which destroys the "human edited" format
of the original HTML code.
makes it rather unreadable.

is there an existing HTML parser which supports tracking/writing
back particular changes in a cautious way by just making local
changes? or a least tracks the tag start/end positions in the file?

Robert

From: Stefan Behnel on 1 Feb 2010 03:34

Robert, 31.01.2010 20:57:
> I tried lxml, but after walking and making changes in the element tree,
> I'm forced to do a full serialization of the whole document
> (etree.tostring(tree)) - which destroys the "human edited" format of the
> original HTML code. makes it rather unreadable.

What do you mean? Could you give an example? lxml certainly does not
destroy anything it parsed, unless you tell it to do so.

Stefan

From: Nobody on 1 Feb 2010 22:09

On Sun, 31 Jan 2010 20:57:31 +0100, Robert wrote:

> I tried lxml, but after walking and making changes in the element
> tree, I'm forced to do a full serialization of the whole document
> (etree.tostring(tree)) - which destroys the "human edited" format
> of the original HTML code.
> makes it rather unreadable.
>
> is there an existing HTML parser which supports tracking/writing
> back particular changes in a cautious way by just making local
> changes? or a least tracks the tag start/end positions in the file?

HTMLParser, sgmllib.SGMLParser and htmllib.HTMLParser all allow you to
retrieve the literal text of a start tag (but not an end tag).
Unfortunately, they're only tokenisers, not parsers, so you'll need to
handle minimisation yourself.

|
Pages: 1
Prev: Meet Arab, Russian, American Singles From All Over The World
Next: Why this error message