From: Robert on
I tried lxml, but after walking and making changes in the element
tree, I'm forced to do a full serialization of the whole document
(etree.tostring(tree)) - which destroys the "human edited" format
of the original HTML code.
makes it rather unreadable.

is there an existing HTML parser which supports tracking/writing
back particular changes in a cautious way by just making local
changes? or a least tracks the tag start/end positions in the file?


Robert
From: Stefan Behnel on
Robert, 31.01.2010 20:57:
> I tried lxml, but after walking and making changes in the element tree,
> I'm forced to do a full serialization of the whole document
> (etree.tostring(tree)) - which destroys the "human edited" format of the
> original HTML code. makes it rather unreadable.

What do you mean? Could you give an example? lxml certainly does not
destroy anything it parsed, unless you tell it to do so.

Stefan
From: Nobody on
On Sun, 31 Jan 2010 20:57:31 +0100, Robert wrote:

> I tried lxml, but after walking and making changes in the element
> tree, I'm forced to do a full serialization of the whole document
> (etree.tostring(tree)) - which destroys the "human edited" format
> of the original HTML code.
> makes it rather unreadable.
>
> is there an existing HTML parser which supports tracking/writing
> back particular changes in a cautious way by just making local
> changes? or a least tracks the tag start/end positions in the file?

HTMLParser, sgmllib.SGMLParser and htmllib.HTMLParser all allow you to
retrieve the literal text of a start tag (but not an end tag).
Unfortunately, they're only tokenisers, not parsers, so you'll need to
handle minimisation yourself.