From: Garrett Smith on
On 2010-06-19 02:33 PM, Thomas 'PointedEars' Lahn wrote:
> Garrett Smith wrote:
>
>> On 6/16/2010 8:15 AM, Thomas 'PointedEars' Lahn wrote:
>>> Garrett Smith wrote:
>>>> Thomas 'PointedEars' Lahn wrote:
>>>>> Garrett Smith wrote:
>>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>>> Garrett Smith wrote:
>>>>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>>>>> Garrett Smith wrote:
>>>>>>>>>> Meeting those goals, the result should be valuable and appreciated
>>>>>>>>>> by many.
>>>>>>>>> Which part of my suggestion did you not like?
>>>>>>>> Nothing, its fine but I did not see a regexp there that tests to see
>>>>>>>> if the string is valid JSON.
>>>>>>> There cannot be such a regular expression in ECMAScript as it does
>>>>>>> not
>>>>>>> support PCRE's recursive matches feature. An implementation of a
>>>>>>> push-down automaton, a parser, is required.
>>>>>> A parser would be too much for the FAQ.
>>>>> Probably, although I think it could be done in not too many lines for
>>>>> the purpose of validation.
>>>>
>>>> That would require more code to be downloaded and more processing to run
>>>> it. What about mobile devices?
>>>
>>> You are confused. What good is shorter code that is not a solution?
>
> I take it that, since you have not answered this, you have come to see the
> flaw in your logic here.
>

If "not valid json" can be determined without writing a parser, and if
it that is more efficient, then the overhead of using a parser is
avoided. That's good.

In order to prove that, I'll need some tests.

[...]

>
> Then all implementations are wrong, or, put more politely, they implement
> only something similar to JSON. The specification at json.org clearly says
> that no control character is allowed in there. Control characters in the
> Basic Latin Unicode block range from U+0000 to U+001F inclusive. If
> anything, the character class is not exclusive enough, since there are
> control characters beyond that block, from U+007F to U+009F inclusive (which
> is easily fixed, though).
>

Sorry, I should have thought more clearly before saying "all
implementations."

BESEN does not extend allowance for TAB in JSONString.

JSON.parse(' "\t" ');

BESEN IDE: SyntaxError

Other major implementations are extend grammar in JSONString to allow
literal TAB. The are wrong, as you claim, but not for the reason you've
supplied. That is, they are not wrong because the don't match what is
stated on json.org. The reason they are wrong is that they do not follow
the JSON Grammar, as defined in ECMAScript.

No other control characters, and you mention U+007F, are explicitly
excluded for the production for JSONString in ECMAScript.

A fallback implementation must not be more strict than the
specification. Filtering out other control characters (007F, etc) would
be a violation of the ECMAScript specification and would not match what
implementations do.

JSON.parse(' "\u007f" ');

Parses a string containing the delete character.

JSON.parse(' "\u007f" ') === "\u007f";

true.

In BESEN, Opera, Firefox 3.6.3, and probably others. It matches the spec.

It would make sense to allow the fallback to allow \t, as all
implementations are allowing that today.

>> I see you went from unicode escape sequences to hex escapes, but why
>> \x00 and not \x0?
>
> Because \x0 would be a syntax error, see ES3/5 7.8.4.
>

"7.8.4 String Literals".
That is the wrong section. That section is describes character escape
sequences in Strings; look down to the section on Regular Expressions.

"7.8.5 Regular Expression Literals"
The specification there states:

| HexEscapeSequence ::
| x HexDigit HexDigit

So you still manage to be correct, even though you've cited the wrong
section.

>> Or why not just use a decimal escape \0?
>
> That is a possibility I was not aware of [15.10.2.11], indeed, but then I
> had two different kinds of character escape sequences in one character
> class, with one not being recognizable as easily. No advantage, only
> disadvantages there; so no, thanks.
>

What are the disadvantages? I see one: "not recognizable easily". Is
that so? I could see how it could be mistaken for an octal escape
sequence, but octal 0 and decimal 0 are the same 0.

What are the other disadvantages?

>> The character range could be written more compactly;
>
> Yes, one could write /[\dA-Fa-f]/. Or even /…[\dA-F]…/i if there would be
> no other letters in the remaining expression.
>
>> instead of [0-9A-Fa-f], [0-9A-f], or even just [0-f], if that is not
>> considered too unreadable.
>
> Neither expression is equivalent to begin with.
>

Right. [0-f] is not because it includes a bunch of other characters and
[0-9A-f] is not because it includes "G-Z".

I'm concerned with using /[\dA-F]/i

So the following should be good:
/[\dA-Fa-f]/
[\dA-F]/i
/[0-9A-Fa-f]

[...]

>
> That programmer was me. I did not want to have a string containing `\"',
> I wanted a string containing `"'. Whatever you tried to prove, it did not
> relate to what I wanted to have, and so not what I wanted to have matched,
> too, by the regular expression.
>

I think we're both know what a JSONString can and cannot contain
JSONString can't contain the unescaped character ". This is shown in the
ES5 spec under grammar for JSONStringCharacter and JSONEscapeSequence,
quoted in my earlier message states.

>>>> JSON.parse, the backslash must be escaped. Thus, you would use:
>>>>
>>>> var quoteMarkInJSONString = '"\\""';
>>>
>>> Yes, but that is not how JSON is usually being put in. That is, the
>>> escaping backslash is _not_ escaped then, and the characters that
>>> quoteMarkInJSONString contains are
>>>
>>> \"
>>>
>>
>> Right; a JSONValue is usually going to be supplied as a value of an
>> identifier, e.g. xhr.responseText. A string value having the character
>> sequence - "\"" - is valid JSON.
>>
>>> and not
>>>
>>> "
>>
>> That would be invalid.
>
> Most certainly not, since "\"" is valid. I am talking about the literal
> value here, after the expansion of escape sequences.
>

What?! I see we're back to this. Again, an unescaped " is invalid in a
JSONString.

JSON.parse(' "\"" ');

Must result in a SyntaxError.

[...]

>> JSONStringCharacter may not contain backslash unless that is part of a
>> JSONEscapeSequence.
>

Correction: "must not".

> I rest my case.
>

OK, so we agree on that. However JSONStringCharacter also must not
contain a double quote mark unless that is part of a JSONEscapeSequence.

Garrett
From: Thomas 'PointedEars' Lahn on
Garrett Smith wrote:

> On 2010-06-19 02:33 PM, Thomas 'PointedEars' Lahn wrote:
>> Garrett Smith wrote:
>>> On 6/16/2010 8:15 AM, Thomas 'PointedEars' Lahn wrote:
>>>> Garrett Smith wrote:
>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>> Garrett Smith wrote:
>>>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>>>> Garrett Smith wrote:
>>>>>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>>>>>> Garrett Smith wrote:
>>>>>>>>>>> Meeting those goals, the result should be valuable and
>>>>>>>>>>> appreciated by many.
>>>>>>>>>> Which part of my suggestion did you not like?
>>>>>>>>> Nothing, its fine but I did not see a regexp there that tests to
>>>>>>>>> see if the string is valid JSON.
>>>>>>>> There cannot be such a regular expression in ECMAScript as it does
>>>>>>>> not support PCRE's recursive matches feature. An implementation of
>>>>>>>> a push-down automaton, a parser, is required.
>>>>>>> A parser would be too much for the FAQ.
>>>>>> Probably, although I think it could be done in not too many lines for
>>>>>> the purpose of validation.
>>>>> That would require more code to be downloaded and more processing to
>>>>> run it. What about mobile devices?
>>>> You are confused. What good is shorter code that is not a solution?
>> I take it that, since you have not answered this, you have come to see
>> the flaw in your logic here.
>
> If "not valid json" can be determined without writing a parser, and if
> it that is more efficient, then the overhead of using a parser is
> avoided. That's good.
>
> In order to prove that, I'll need some tests.

You need a dosage of common sense instead. Or a course in theoretical
computer science.

> It would make sense to allow the fallback to allow \t, as all
> implementations are allowing that today.

No, it would make sense to implement what was specified, and work around the
borken implementations that way until they no longer needed that workaround,
and report the bug so that it is fixed.

>>> I see you went from unicode escape sequences to hex escapes, but why
>>> \x00 and not \x0?
>>
>> Because \x0 would be a syntax error, see ES3/5 7.8.4.
>
> "7.8.4 String Literals".
> That is the wrong section. That section is describes character escape
> sequences in Strings; look down to the section on Regular Expressions.

Look what the section on Regular Expressions refers to.

> "7.8.5 Regular Expression Literals"
> The specification there states:
>
> | HexEscapeSequence ::
> | x HexDigit HexDigit

And now go read in which terms that production is defined.

> So you still manage to be correct, even though you've cited the wrong
> section.

No, I did not. I thought I make it easier for you by referring you to the
referred definition, but I had not considered your reading problem. Sorry.

>>> Or why not just use a decimal escape \0?
>>
>> That is a possibility I was not aware of [15.10.2.11], indeed, but then I
>> had two different kinds of character escape sequences in one character
>> class, with one not being recognizable as easily. No advantage, only
>> disadvantages there; so no, thanks.
>
> What are the disadvantages? I see one: "not recognizable easily".

I rest my case. You really can't read.

> Is that so?

Yes, it is.

> I could see how it could be mistaken for an octal escape
> sequence, but octal 0 and decimal 0 are the same 0.

Rubbish.

> What are the other disadvantages?

Learn to read.

>>> The character range could be written more compactly;
>>
>> Yes, one could write /[\dA-Fa-f]/. Or even /…[\dA-F]…/i if there would
>> be no other letters in the remaining expression.
>>
>>> instead of [0-9A-Fa-f], [0-9A-f], or even just [0-f], if that is not
>>> considered too unreadable.
>>
>> Neither expression is equivalent to begin with.
>>
>
> Right. [0-f] is not because it includes a bunch of other characters and
> [0-9A-f] is not because it includes "G-Z".

You don't say!

> I'm concerned with using /[\dA-F]/i

I am not, under the stated conditions. But these conditions may not apply
here.

> So the following should be good:
> /[\dA-Fa-f]/
> [\dA-F]/i
> /[0-9A-Fa-f]

You appear to have no idea what you are talking about.

>> That programmer was me. I did not want to have a string containing `\"',
>> I wanted a string containing `"'. Whatever you tried to prove, it did
>> not relate to what I wanted to have, and so not what I wanted to have
>> matched, too, by the regular expression.
>
> I think we're both know what a JSONString can and cannot contain

No, "we're" don't.

> JSONString can't contain the unescaped character ".

And nobody said it could.

> JSON.parse(' "\"" ');
>
> Must result in a SyntaxError.

Of course, but that is well beside the point.

"\""

e.g. in an HTTP response message body is with absolute certainty JSON text,
and so must pass validation. You are clinging to ECMAScript string literals
instead, where further escaping of JSON strings is necessary.

>>> JSONStringCharacter may not contain backslash unless that is part of a
>>> JSONEscapeSequence.
>
> Correction: "must not".
>
>> I rest my case.
>
> OK, so we agree on that.

Quite the contrary. Instead of proving me wrong, you have thus confirmed my
argument that /"[^\\"...]*"/ is insufficient to match a JSON string.


PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7(a)news.demon.co.uk>
From: Garrett Smith on
On 2010-06-20 05:17 PM, Thomas 'PointedEars' Lahn wrote:
> Garrett Smith wrote:
>
>> On 2010-06-19 02:33 PM, Thomas 'PointedEars' Lahn wrote:
>>> Garrett Smith wrote:
>>>> On 6/16/2010 8:15 AM, Thomas 'PointedEars' Lahn wrote:
>>>>> Garrett Smith wrote:
>>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>>> Garrett Smith wrote:
>>>>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>>>>> Garrett Smith wrote:
>>>>>>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>>>>>>> Garrett Smith wrote:
>>>>>>>>>>>> Meeting those goals, the result should be valuable and
>>>>>>>>>>>> appreciated by many.
>>>>>>>>>>> Which part of my suggestion did you not like?
>>>>>>>>>> Nothing, its fine but I did not see a regexp there that tests to
>>>>>>>>>> see if the string is valid JSON.
>>>>>>>>> There cannot be such a regular expression in ECMAScript as it does
>>>>>>>>> not support PCRE's recursive matches feature. An implementation of
>>>>>>>>> a push-down automaton, a parser, is required.
>>>>>>>> A parser would be too much for the FAQ.
>>>>>>> Probably, although I think it could be done in not too many lines for
>>>>>>> the purpose of validation.
>>>>>> That would require more code to be downloaded and more processing to
>>>>>> run it. What about mobile devices?
>>>>> You are confused. What good is shorter code that is not a solution?
>>> I take it that, since you have not answered this, you have come to see
>>> the flaw in your logic here.
>>
>> If "not valid json" can be determined without writing a parser, and if
>> it that is more efficient, then the overhead of using a parser is
>> avoided. That's good.
>>
>> In order to prove that, I'll need some tests.
>
> You need a dosage of common sense instead. Or a course in theoretical
> computer science.
>
>> It would make sense to allow the fallback to allow \t, as all
>> implementations are allowing that today.
>
> No, it would make sense to implement what was specified, and work around the
> borken implementations that way until they no longer needed that workaround,
> and report the bug so that it is fixed.
>

Hold your breath, Thomas, It'll surely be fixed soon.

> Quite the contrary. Instead of proving me wrong, you have thus confirmed my
> argument that /"[^\\"...]*"/ is insufficient to match a JSON string.
>
>

We seem to have different goals.

My goal is to develop a solution that can be used to evaluate a JSON
response. I'm going to continue with that goal.

Garrett