From: afrinspray on
I posted a message titled "Best way to remove body/html tag from
HTML::Element tree" on Sep 6 2006.

Tad McClellan helped me out by referring me to
http://perlmonks.org/?node_id=554219 which explains using
XML::SAX::Writer. Everything was going well with the tag parsing until
I started giving the sax parser special characters for quotes:

Hopefully these characters make it through... it's converting:
& r s q u o ; (no spaces)
to:
รข (a with a hat)

Thanks in advance....


Mike

From: Tad McClellan on
afrinspray <afrinspray(a)gmail.com> wrote:

> Tad McClellan helped me out by referring me to
> http://perlmonks.org/?node_id=554219


No I didn't.


--
Tad McClellan SGML consulting
tadmc(a)augustmail.com Perl programming
Fort Worth, Texas
From: afrinspray on
Sorry that was Todd W.:
http://groups-beta.google.com/group/comp.lang.perl.misc/browse_thread/thread/a1d24b4eec251e80/

Anyway, does anyone have any ideas how I can get it to stop converting
& n b s p ; and other standard HTML entities to gibberish?


Mike


Tad McClellan wrote:
> afrinspray <afrinspray(a)gmail.com> wrote:
>
> > Tad McClellan helped me out by referring me to
> > http://perlmonks.org/?node_id=554219
>
>
> No I didn't.
>
>
> --
> Tad McClellan SGML consulting
> tadmc(a)augustmail.com Perl programming
> Fort Worth, Texas

From: afrinspray on
Sorry that was Todd W.:
http://groups-beta.google.com/group/comp.lang.perl.misc/browse_thread/thread/a1d24b4eec251e80/

Anyway, does anyone have any ideas how I can get it to stop convert & n
b s p ; and other standard HTML entities to gibberish?


Mike

On Oct 26, 2:02 pm, Tad McClellan <t...(a)augustmail.com> wrote:
> afrinspray <afrinsp...(a)gmail.com> wrote:
> > Tad McClellan helped me out by referring me to
> >http://perlmonks.org/?node_id=554219No I didn't.
>
> --
> Tad McClellan SGML consulting
> t...(a)augustmail.com Perl programming
> Fort Worth, Texas

From: afrinspray on
Ok after some research I think I can better narrow down the problem I'm
having. The module XML::Filter::SAX1toSAX2 is converting my html
entities (&nbsp; &#8217 etc...) to weird characters.

I changed the XML::SAX::Machines Pipeline in my code from this:
my $machine = Pipeline(
'XML::Filter::SAX1toSAX2' =>
'XML::Filter::BufferText' =>
'XML::Filter::HtmlTagStripper' =>
$writer
);

to
my $machine = Pipeline(
'XML::Filter::SAX1toSAX2' =>
\*STDOUT
);

and it's converting the entities to gibberish. Is there another
SAX1toSAX2 like module out there? Can anyone thing of a replacement?
If i remove the SAX1toSAX2 call from the Pipeline, there's no output.

Also, on a side note I previous decoded the input using
MIME::Decoder...

Any help would be greatly appreciated.

Mike


afrinspray wrote:
> Sorry that was Todd W.:
> http://groups-beta.google.com/group/comp.lang.perl.misc/browse_thread/thread/a1d24b4eec251e80/
>
> Anyway, does anyone have any ideas how I can get it to stop convert & n
> b s p ; and other standard HTML entities to gibberish?
>
>
> Mike

 |  Next  |  Last
Pages: 1 2
Prev: Remove short words from a string
Next: file convesrions