From: John G Harris on
On Sun, 21 Feb 2010 at 01:06:20, in comp.lang.javascript, Thomas
'PointedEars' Lahn wrote:
>kangax wrote:
>
>> On 2/20/10 4:49 PM, David Mark wrote:
>>> But if it is inline script, you need to escape the slash so the second
>>> string value isn't mistaken for a closing H1 tag.
>>>
>>> document.write("<h1>" + x +"<\/h1>");
>>
>> Yeah, one of those things that standard and de-facto standard disagree
>> on.
>
>How did you get the idea that invalid markup would be a de-facto standard?
>A million flies can't be wrong?
>
>> I've tested the whole slew of browsers—ancient, mobile, desktop,
>> etc.—and none would close the script tag on discovery of ETAGO.
>
>You need to test more, and refine your tests. The issue is known to occur
>with "<script ...>...</script>" in particular, but it has been observed on
>other occasions as well.
<snip>

Here's what the HTML 4.01 standard says in section B.3.2 :

When script or style data is the content of an element (SCRIPT and
STYLE), the data begins immediately after the element start tag and
ends at the first ETAGO ("</") delimiter followed by a name start
character ([a-zA-Z]); note that this may not be the element's end
tag.
Authors should therefore escape "</" within the content. Escape
mechanisms are specific to each scripting or style sheet language.

so 'other occasions' should exist.

John
--
John Harris
From: Richard Cornford on
John G Harris wrote:
> On Sun, 21 Feb 2010, Thomas 'PointedEars' Lahn wrote:
>>kangax wrote:
<snip>
>>> I've tested the whole slew of browsers—ancient, mobile,
>>> desktop, etc.—and none would close the script tag on
>>> discovery of ETAGO.
>>
>> You need to test more, and refine your tests. The issue is
>> known to occur with "<script ...>...</script>" in particular,
>> but it has been observed on other occasions as well.
> <snip>
>
> Here's what the HTML 4.01 standard says in section B.3.2 :
>
> When script or style data is the content of an element (SCRIPT
> and STYLE), the data begins immediately after the element
> start tag and ends at the first ETAGO ("</") delimiter
> followed by a name start character ([a-zA-Z]); note that
> this may not be the element's end tag.
> Authors should therefore escape "</" within the content.
> Escape mechanisms are specific to each scripting or style
> sheet language.
>
> so 'other occasions' should exist.

It is interesting to note what SGML (ISO 8879) has to say on the
subject, i.e:-

| B.13.1.1 Character Data (PCDATA, CDATA, and RCDATA)
| ...
| If an element contains declared character data, it cannot contain
| anything else. The markup parser scans it only to locate an etago
| or net; other markup is ignored. #Only the correct end-tag (or that
| of an element in which this element is nested) will be recognised.
| ...

So HTML is much more restrictive than SGML, as SGML is going to verify
that etago is part of the "correct" end-tag before recognising it as
such. Of course as virtually no browsers actually follow SGML rules this
one occasion where they appear to be closer to SGML than HTML in their
behaviour is pretty meaningless (better attributed to coincidence than
anything else).

On the other hand, if testing exactly how browsers handle SCRIPT element
contents, it has got to be worth verifying that none act on that "or
that of an element in which this element is nested" as if any do you
then have an issue beyond "</script>", which is context-related and so
may go unnoticed (only emerging occasionally to bite the odd victim who
accidentally wanders into a dangerous context).

Personally, I am happy to play along with the HTML specs, even if they
are probably more restrictive than observed environments require.

Richard.

From: Eric Bednarz on
"Richard Cornford" <Richard(a)litotes.demon.co.uk> writes:

> It is interesting to note what SGML (ISO 8879) has to say on the
> subject, i.e:-
>
> | B.13.1.1 Character Data (PCDATA, CDATA, and RCDATA)
> | ...
> | If an element contains declared character data, it cannot contain
> | anything else. The markup parser scans it only to locate an etago
> | or net; other markup is ignored. #Only the correct end-tag (or that
> | of an element in which this element is nested) will be recognised.
> | ...

That sounds nice, but is in conflict with clause 7.6, which states:

| The content of an element declared to be character data or replaceable
| character data is terminated only by an ETAGO delimiter-in-context
| (which need not open a valid end-tag) or a valid NET.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(where the context of '-in-context' is a name start character, or GRPO
if CONCUR in the SGML declaration is set to YES followed by a number)

This is also how it is implemented in (o)nsgmls and why CDATA is
widely considered BrokenAsDesigned.
From: joy99 on
On Feb 22, 12:38 am, Eric Bednarz <bedn...(a)fahr-zur-hoelle.org> wrote:
> "Richard Cornford" <Rich...(a)litotes.demon.co.uk> writes:
> > It is interesting to note what SGML (ISO 8879) has to say on the
> > subject, i.e:-
>
> > | B.13.1.1 Character Data (PCDATA, CDATA, and RCDATA)
> > | ...
> > | If an element contains declared character data, it cannot contain
> > | anything else. The markup parser scans it only to locate an etago
> > | or net; other markup is ignored. #Only the correct end-tag (or that
> > | of an element in which this element is nested) will be recognised.
> > | ...
>
> That sounds nice, but is in conflict with clause 7.6, which states:
>
> | The content of an element declared to be character data or replaceable
> | character data is terminated only by an ETAGO delimiter-in-context
> | (which need not open a valid end-tag) or a valid NET.
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> (where the context of ‘-in-context’ is a name start character, or GRPO
> if CONCUR in the SGML declaration is set to YES followed by a number)
>
> This is also how it is implemented in (o)nsgmls and why CDATA is
> widely considered BrokenAsDesigned.

Dear Group,

Thank you all for kindly taking your valuable time to guide a new
learner.

Wishing you all A Happy Day Ahead,
Best Regs,
Subhabrata.