|
Prev: Remove short words from a string
Next: file convesrions
From: afrinspray on 26 Oct 2006 15:08 I posted a message titled "Best way to remove body/html tag from HTML::Element tree" on Sep 6 2006. Tad McClellan helped me out by referring me to http://perlmonks.org/?node_id=554219 which explains using XML::SAX::Writer. Everything was going well with the tag parsing until I started giving the sax parser special characters for quotes: Hopefully these characters make it through... it's converting: & r s q u o ; (no spaces) to: รข (a with a hat) Thanks in advance.... Mike
From: Tad McClellan on 26 Oct 2006 17:02 afrinspray <afrinspray(a)gmail.com> wrote: > Tad McClellan helped me out by referring me to > http://perlmonks.org/?node_id=554219 No I didn't. -- Tad McClellan SGML consulting tadmc(a)augustmail.com Perl programming Fort Worth, Texas
From: afrinspray on 26 Oct 2006 18:57 Sorry that was Todd W.: http://groups-beta.google.com/group/comp.lang.perl.misc/browse_thread/thread/a1d24b4eec251e80/ Anyway, does anyone have any ideas how I can get it to stop converting & n b s p ; and other standard HTML entities to gibberish? Mike Tad McClellan wrote: > afrinspray <afrinspray(a)gmail.com> wrote: > > > Tad McClellan helped me out by referring me to > > http://perlmonks.org/?node_id=554219 > > > No I didn't. > > > -- > Tad McClellan SGML consulting > tadmc(a)augustmail.com Perl programming > Fort Worth, Texas
From: afrinspray on 26 Oct 2006 19:09 Sorry that was Todd W.: http://groups-beta.google.com/group/comp.lang.perl.misc/browse_thread/thread/a1d24b4eec251e80/ Anyway, does anyone have any ideas how I can get it to stop convert & n b s p ; and other standard HTML entities to gibberish? Mike On Oct 26, 2:02 pm, Tad McClellan <t...(a)augustmail.com> wrote: > afrinspray <afrinsp...(a)gmail.com> wrote: > > Tad McClellan helped me out by referring me to > >http://perlmonks.org/?node_id=554219No I didn't. > > -- > Tad McClellan SGML consulting > t...(a)augustmail.com Perl programming > Fort Worth, Texas
From: afrinspray on 27 Oct 2006 14:16
Ok after some research I think I can better narrow down the problem I'm having. The module XML::Filter::SAX1toSAX2 is converting my html entities ( ’ etc...) to weird characters. I changed the XML::SAX::Machines Pipeline in my code from this: my $machine = Pipeline( 'XML::Filter::SAX1toSAX2' => 'XML::Filter::BufferText' => 'XML::Filter::HtmlTagStripper' => $writer ); to my $machine = Pipeline( 'XML::Filter::SAX1toSAX2' => \*STDOUT ); and it's converting the entities to gibberish. Is there another SAX1toSAX2 like module out there? Can anyone thing of a replacement? If i remove the SAX1toSAX2 call from the Pipeline, there's no output. Also, on a side note I previous decoded the input using MIME::Decoder... Any help would be greatly appreciated. Mike afrinspray wrote: > Sorry that was Todd W.: > http://groups-beta.google.com/group/comp.lang.perl.misc/browse_thread/thread/a1d24b4eec251e80/ > > Anyway, does anyone have any ideas how I can get it to stop convert & n > b s p ; and other standard HTML entities to gibberish? > > > Mike |