From: Garrett Smith on
On 6/13/2010 11:43 PM, Garrett Smith wrote:
> On 6/13/2010 1:22 PM, Garrett Smith wrote:
>> On 6/9/2010 3:37 AM, Thomas 'PointedEars' Lahn wrote:
>>> Garrett Smith wrote:
>>>
>>>> Thomas 'PointedEars' Lahn wrote:
>>>>> Garrett Smith wrote:
>
> [...]
>
>>
>> JSON.parse('[],{}');
>> SyntaxError
>>
>> eval('[],{}');
>> {}
>>
>
> One way to get around that is to wrap the whole expression not in
> Grouping Operator, but array literal characters '[' + text + ']'. After
> evaluating that, the only valid result could be an array with length =
> 0; anything else should result in SyntaxError.
>
Correction: the only valid result could be an array with length = 1.

>
> This quite simple to test in a regexp:
>
> var jsonString = /"[^"\\\n\r\u0000-\u001f]*"/
>

Actually simpler than that. \n or \r are redundant and included in the
character range that follows. I copied the first half of that from json2.js.

var jsonString = /"[^"\\\u0000-\u001f]*"/

That can be used in the longer pattern borrowed from json2.js. Actually,
json2.js should use that.

Garrett
From: Thomas 'PointedEars' Lahn on
Garrett Smith wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Garrett Smith wrote:
>>> Thomas 'PointedEars' Lahn wrote:
>>>> Garrett Smith wrote:
>>>>> Meeting those goals, the result should be valuable and appreciated by
>>>>> many.
>>>> Which part of my suggestion did you not like?
>>> Nothing, its fine but I did not see a regexp there that tests to see if
>>> the string is valid JSON.
>> There cannot be such a regular expression in ECMAScript as it does not
>> support PCRE's recursive matches feature. An implementation of a
>> push-down automaton, a parser, is required.
>
> A parser would be too much for the FAQ.

Probably, although I think it could be done in not too many lines for the
purpose of validation.

> The approach on json2.js would fit nicely, but is it acceptable?
>
> The first thing that jumped out at me was that number matching allows
> trailing decimal for numbers -- something that is disallowed in
> JSONNumber.
>
> json2.js has the regular expression used below. I've assigned the result
> `isValidJSON` to refer to it later in this message.
>
> var text = "...";
>
> var isValidJSON = /^[\],:{}\s]*$/.
> test(text.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, '@').
> replace(
> /"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g,
> ']').
> replace(/(?:^|:|,)(?:\s*\[)+/g, ''))

Is this from json2.js? If yes, then it is not acceptable. To begin with,
it does not regard "\"" valid JSON even though it is.

> Number is defined as:
> -?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g
>
> But this allows numbers like "2." so it can be changed to disallow that:
> -?\d+(?:\.\d+)?(?:[eE][+\-]?\d+)?/g

It would still be insufficient. You simply cannot parse a context-free
non-regular language using only one application of only one non-PCRE.

>>> [...] The suggestion to use an object literal as the string to the
>>> argument to JSON.parse is not any better than using "true".
>>
>> But it is. It places further requirements on the capabilities of the
>> parser. An even better test would be a combination of all results of all
>> productions of the JSON grammar.
>
> Cases that are known to be problematic can be filtered.

Your point being?

> Every possibility of JSON Grammar cannot be checked unless either the
> string is parsed

Exactly. But that is not what I suggested.

> or the code performs a set of feature tests that tests every possibility.

Do you realize that this is not possible?


PointedEars
--
realism: HTML 4.01 Strict
evangelism: XHTML 1.0 Strict
madness: XHTML 1.1 as application/xhtml+xml
-- Bjoern Hoehrmann
From: Garrett Smith on
On 6/15/2010 12:14 PM, Thomas 'PointedEars' Lahn wrote:
> Garrett Smith wrote:
>
>> Thomas 'PointedEars' Lahn wrote:
>>> Garrett Smith wrote:
>>>> Thomas 'PointedEars' Lahn wrote:
>>>>> Garrett Smith wrote:
>>>>>> Meeting those goals, the result should be valuable and appreciated by
>>>>>> many.
>>>>> Which part of my suggestion did you not like?
>>>> Nothing, its fine but I did not see a regexp there that tests to see if
>>>> the string is valid JSON.
>>> There cannot be such a regular expression in ECMAScript as it does not
>>> support PCRE's recursive matches feature. An implementation of a
>>> push-down automaton, a parser, is required.
>>
>> A parser would be too much for the FAQ.
>
> Probably, although I think it could be done in not too many lines for the
> purpose of validation.
>

That would require more code to be downloaded and more processing to run
it. What about mobile devices?

[...]

>> var isValidJSON = /^[\],:{}\s]*$/.
>> test(text.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, '@').
>> replace(
>> /"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g,
>> ']').
>> replace(/(?:^|:|,)(?:\s*\[)+/g, ''))
>
> Is this from json2.js? If yes, then it is not acceptable. To begin with,
> it does not regard "\"" valid JSON even though it is.
>

The code is from json2.js:
http://www.json.org/json2.js

The character sequence "\"" is valid JSON value in ecmascript, however
in ecmascript, if enclosed in a single quote string, as - '"\""' - the
backslash would escape the double quote mark, resulting in '"""', which
is not valid JSON.

To pass a string value containing the character sequence "\"" to
JSON.parse, the backslash must be escaped. Thus, you would use:

var quoteMarkInJSONString = '"\\""';

And that works.

JSON.parse(quoteMarkInJSONString) == JSON.parse('"\\""') == "\""

Result: string value containing the single character: ".

JSON.parse(quoteMarkInJSONString) === "\""

>> Number is defined as:
>> -?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g
>>
>> But this allows numbers like "2." so it can be changed to disallow that:
>> -?\d+(?:\.\d+)?(?:[eE][+\-]?\d+)?/g
>
> It would still be insufficient. You simply cannot parse a context-free
> non-regular language using only one application of only one non-PCRE.
>

The goal of json2.js's JSON.parse is not to filter out values that are
valid; it is to eliminate values that are invalid. So far, it was
noticed to fail at that in four ways and I addressed those.

A fifth way that json2.js fails is to allow digits beginning with 0.

JSON.parse("01");

That results 1, but should thow an error. Firefox does the same thing.

>>>> [...] The suggestion to use an object literal as the string to the
>>>> argument to JSON.parse is not any better than using "true".
>>>
>>> But it is. It places further requirements on the capabilities of the
>>> parser. An even better test would be a combination of all results of all
>>> productions of the JSON grammar.
>>
>> Cases that are known to be problematic can be filtered.
>
> Your point being?
>

My point is that instead of trying every possible valid grammar check,
known bugs -- such as allowing 1. and +1 and 01, as seen in Spidermonkey
-- could be checked.

Checking every possible input is not possible.

JSON.parse('{"x": "42"}').x == "42" - tests a benign case. It doesn't
filter out any known error cases. Any implementation that can handle
JSON.parse('true') should be able to handle parsing the JSONObject.

Allowing the native implementation to fail on invalid cases such as
parsing "2." as JSON results in an inconsistent interface whereby in
current versions Firefox and IE, the value `2` results, but in current
versions of Opera and Chrome, an error is thrown.

The inconsistency might seem minor but it could result in a
hard-to-track down bug. For example, consider an application that sends
a money value back to the client as {"dollars" : 2.11}. When the money
value includes a decimal cents, it runs fine, but when, say, a value
`2.` is passed, it correctly throws an error in untested browsers.

This brings me back to the idea of developing a strategy of identifying
known bugs in implementations and then devising feature tests for those
bugs and then only allowing an implementation that does not pass the
feature test to run its own native JSON.parse.

The fallback can disallow anything not allowed by JSON Grammar,

The stage I'm at now is identifying implementation bugs and defining a
validator for, actually, isInvalidJSON.

A thorough test case for valid JSON is needed.

I'm considering porting the test cases from Opera; AFAIK, Opera's suite
is the only suite for JSON and it is not offered as a zipped download.
One must download each JS file. It would also probably be a good idea to
not use sync requests, as that test runner does. I got a freeze/crash in
IE8 with that.

>> Every possibility of JSON Grammar cannot be checked unless either the
>> string is parsed
>
> Exactly. But that is not what I suggested.
>
>> or the code performs a set of feature tests that tests every possibility.
>
> Do you realize that this is not possible?
>

That was my point.

Garrett
From: Garrett Smith on
On 6/15/2010 3:15 PM, Garrett Smith wrote:
> On 6/15/2010 12:14 PM, Thomas 'PointedEars' Lahn wrote:
>> Garrett Smith wrote:
>>
>>> Thomas 'PointedEars' Lahn wrote:
>>>> Garrett Smith wrote:
>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>> Garrett Smith wrote:

[...]

> The character sequence "\"" is valid JSON value in ecmascript, however
> in ecmascript, if enclosed in a single quote string, as - '"\""' - the
> backslash would escape the double quote mark, resulting in '"""', which
> is not valid JSON.
>
> To pass a string value containing the character sequence "\"" to
> JSON.parse, the backslash must be escaped. Thus, you would use:
>
> var quoteMarkInJSONString = '"\\""';
>
> And that works.
>
> JSON.parse(quoteMarkInJSONString) == JSON.parse('"\\""') == "\""
>

Paste error. Should have been just:

JSON.parse(quoteMarkInJSONString)

> Result: string value containing the single character: ".
>
> JSON.parse(quoteMarkInJSONString) === "\""
>
From: Lasse Reichstein Nielsen on
Thomas 'PointedEars' Lahn <PointedEars(a)web.de> writes:

> Is this from json2.js? If yes, then it is not acceptable. To begin with,
> it does not regard "\"" valid JSON even though it is.

It didn't use to be valid.
Originally, a JSON text had to be either an object or an array, but not
a simple value.
This was changed at some point (I'm guessing during ES5 development) so
that the grammar on json.org and the one in the ES5 spec allow JSON text
to be any JSON value.
JSON2 implements the original version.

>> Number is defined as:
>> -?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g
>>
>> But this allows numbers like "2." so it can be changed to disallow that:
>> -?\d+(?:\.\d+)?(?:[eE][+\-]?\d+)?/g
>
> It would still be insufficient. You simply cannot parse a context-free
> non-regular language using only one application of only one non-PCRE.

The idea of the regexp isn't to check that the grammar is correct, but
merely that all the tokens are valid.

It's almost enough to guarantee that a successful eval on the string
would mean that the grammar was also correct. But only almost,
e.g., '{"x":{"y":42}[37,"y"]}' uses only correct tokens.

Still, it disallows arbitrary code execution, which I guess is the main
reason, and it correctly handles all valid JSON.

/L
--
Lasse Reichstein Holst Nielsen
'Javascript frameworks is a disruptive technology'