From: shadidas on
Hi,

I once wrote a tcl code that parses an xml file which has in its text
elements spcecial characters like "<" which is normally not a good
practice to include in an xml text element. However, when I ran my
parsing tcl code on that xml file on Sun Solaris system, and using the
Tclxml-3.2 package using only the direct tcl installation rather than
the C compiled installation "libxml2", the parsing didn't give any
errors for using this kind of special characters like the one I
mentioned at the beginning of this paragraph.

However, when I tried to run the same code on linux, and using Tclxml
or even Tdom packages, it gave me an error which should be ok since I
know already this kind of special characters shouldn't be used
anyway.

Anyway, I figured out maybe it's using tclxml with the direct tcl
installation that can have no problems with such characters, but
tclxml with libxml2 may have problems. So, I tried when parsing in my
tcl code adding the option to choose my parser to use whether it's tcl
or libxml2 through using [dom::parse $file -parser { tcl | libxml2 }].

However, using the "tcl" option didn't seem to solve the problem since
the interpreter did not understand it. I read it should be ok to use
it if one has both the "tcl" and the "libxml2" parser classes
installed, so one can have the option to choose the parser. So, I
tried to install the tclxml using only direct tcl implementation
through running the "configure" and "make install" commands. Then, I
tried installing also the tclxml once more but using the libxml2
through running the "configure, make, and make install". Then, I tried
again to run my tcl code using the parser option "tcl", but still of
no good.

So, my question would be, supposing I didn't make anything wrong in my
whole steps, why would the special characters be understood one time
(using tclxml with tcl installation on solaris), and make an error the
other (using tclxml with libxml2 on linux)?

Thanks, and sorry for the long description.
From: Steve Ball on
Hi,

An unescaped '<' character would result in your XML being not well-
formed. If any processor parsed that document without reporting an
error then that's a bug.

HTHs,
Steve Ball

On Jul 14, 2:10 pm, shadidas <shady.for...(a)gmail.com> wrote:
> Hi,
>
> I once wrote a tcl code that parses an xml file which has in its text
> elements spcecial characters like "<" which is normally not a good
> practice to include in an xml text element. However, when I ran my
> parsing tcl code on that xml file on Sun Solaris system, and using the
> Tclxml-3.2 package using only the direct tcl installation rather than
> the C compiled installation "libxml2", the parsing didn't give any
> errors for using this kind of special characters like the one I
> mentioned at the beginning of this paragraph.
>
> However, when I tried to run the same code on linux, and using Tclxml
> or even Tdom packages, it gave me an error which should be ok since I
> know already this kind of special characters shouldn't be used
> anyway.
>
> Anyway, I figured out maybe it's using tclxml with the direct tcl
> installation that can have no problems with such characters, but
> tclxml with libxml2 may have problems. So, I tried when parsing in my
> tcl code adding the option to choose my parser to use whether it's tcl
> or libxml2 through using [dom::parse $file -parser { tcl | libxml2 }].
>
> However, using the "tcl" option didn't seem to solve the problem since
> the interpreter did not understand it. I read it should be ok to use
> it if one has both the "tcl" and the "libxml2" parser classes
> installed, so one can have the option to choose the parser. So, I
> tried to install the tclxml using only direct tcl implementation
> through running the "configure" and "make install" commands. Then, I
> tried installing also the tclxml once more but using the libxml2
> through running the "configure, make, and make install". Then, I tried
> again to run my tcl code using the parser option "tcl", but still of
> no good.
>
> So, my question would be, supposing I didn't make anything wrong in my
> whole steps, why would the special characters be understood one time
> (using tclxml with tcl installation on solaris), and make an error the
> other (using tclxml with libxml2 on linux)?
>
> Thanks, and sorry for the long description.

From: Alexandre Ferrieux on
On Jul 14, 6:10 am, shadidas <shady.for...(a)gmail.com> wrote:
> Hi,
>
> I once wrote a tcl code that parses an xml file which has in its text
> elements spcecial characters like "<" which is normally not a good
> practice to include in an xml text element. However, when I ran my
> parsing tcl code on that xml file on Sun Solaris system, and using the
> Tclxml-3.2 package using only the direct tcl installation rather than
> the C compiled installation "libxml2", the parsing didn't give any
> errors for using this kind of special characters like the one I
> mentioned at the beginning of this paragraph.
>
> However, when I tried to run the same code on linux, and using Tclxml
> or even Tdom packages, it gave me an error which should be ok since I
> know already this kind of special characters shouldn't be used
> anyway.
>
> Anyway, I figured out maybe it's using tclxml with the direct tcl
> installation that can have no problems with such characters, but
> tclxml with libxml2 may have problems. So, I tried when parsing in my
> tcl code adding the option to choose my parser to use whether it's tcl
> or libxml2 through using [dom::parse $file -parser { tcl | libxml2 }].
>
> However, using the "tcl" option didn't seem to solve the problem since
> the interpreter did not understand it. I read it should be ok to use
> it if one has both the "tcl" and the "libxml2" parser classes
> installed, so one can have the option to choose the parser. So, I
> tried to install the tclxml using only direct tcl implementation
> through running the "configure" and "make install" commands. Then, I
> tried installing also the tclxml once more but using the libxml2
> through running the "configure, make, and make install". Then, I tried
> again to run my tcl code using the parser option "tcl", but still of
> no good.
>
> So, my question would be, supposing I didn't make anything wrong in my
> whole steps, why would the special characters be understood one time
> (using tclxml with tcl installation on solaris), and make an error the
> other (using tclxml with libxml2 on linux)?
>
> Thanks, and sorry for the long description.

Obviously the older library was exceedingly tolerant, since XML
disallows non-tagging "<" and ">". No sane XML parser can be blamed
for choking on _that_ :/

Out of curiosity, can you give a (minimized) example ? Once the
special casing is identified, it should be easy to to some string pre-
processing (and tree postprocessing) to emulate it with modern
parsers.

-Alex