From: Marc Guay on
> I wonder what the difference is between doing "new
> SimpleXMLElement" and calling simplexml_load_string which results in the
> libxml_use_internal_errors call being ineffective. Odd.

The documentation for "Dealing with XML errors" only mentions
simplexml_load_string() and this comment
http://ca3.php.net/manual/en/simplexml.examples-basic.php#93263 shows
that you're not the first person to run into this.

Marc
From: "Gary ." on
Okay. At least one of the problems with this so called HTML seems to
be that the body tag looks like
<BODY vlink=#ffffff ...>
and xml_parse complains that "> required" on that line (i.e. it is
claiming it can't find the end of the tag!).

I'm guessing that those attributes "must" be quoted in XML and
"should" be in HTML (but patently aren't)? Is there any way to get
xml_parse to ignore that? My element_handler functions never even get
a chance to see that line.

Regex to insert quotes or remove the attributes entirely, perhaps?
*gulp* I hope there's a better way than that.
From: Richard Quadling on
On 8 July 2010 16:15, Gary . <php-general(a)garydjones.name> wrote:
> Okay. At least one of the problems with this so called HTML seems to
> be that the body tag looks like
> <BODY vlink=#ffffff ...>
> and xml_parse complains that "> required" on that line (i.e. it is
> claiming it can't find the end of the tag!).
>
> I'm guessing that those attributes "must" be quoted in XML and
> "should" be in HTML (but patently aren't)? Is there any way to get
> xml_parse to ignore that? My element_handler functions never even get
> a chance to see that line.
>
> Regex to insert quotes or remove the attributes entirely, perhaps?
> *gulp* I hope there's a better way than that.

So. Essentially, you want to parse some plain text which may or may
not be well formed XML.

In short ... good luck.

How badly formed is the file going to be?

If it is things like missing ", then this could be managed with regex.
Essentially you are going to have to do the clean up that Tidy could
do for you.
From: Marc Guay on
> And yes, I'd rather use DOM, but I can't.

Could you use this: http://simplehtmldom.sourceforge.net/?
From: Nisse =?utf-8?Q?Engstr=C3=B6m?= on
On Thu, 8 Jul 2010 17:15:02 +0200, "Gary ." wrote:

> Okay. At least one of the problems with this so called HTML seems to
> be that the body tag looks like
> <BODY vlink=#ffffff ...>
> and xml_parse complains that "> required" on that line (i.e. it is
> claiming it can't find the end of the tag!).
>
> I'm guessing that those attributes "must" be quoted in XML and
> "should" be in HTML (but patently aren't)?

For that attribute value, it's a "must" in both cases.
And for strict versions of (X)HTML, the attribute does
not exist at all.


/Nisse