From: Garrett Smith on
On 2010-06-18 02:44 AM, Asen Bozhilov wrote:
> Garrett Smith wrote:
>
>> A parser would be too much for the FAQ.
>
> If you implemented full parser for JSON that definitely is not
> necessary for FAQ. You can use something like this:
>
> if (!this.JSON) {
> this.JSON = {
> parse : (function () {
> var JSON_GRAMMAR = {
> STRING : '"([^"\\\\\\x00-\\x1F]|\\\\["\\\\\/bfnrt]|\\\
> \u[0-9A-Fa-f]{4})*"',

If you use a regexp, you don't have to worry about escaping backslashes
in the pattern. As a strategy (not real):

var STRING_BACKSLASH = /[\\]/,
NUMBER_INT = /\d+/;

var fooIntExp = new RegExp(
STRING_BACKSLASH.source
+ "|"+ NUMBER_INT.source
);

The downside is that requires extra regexp object creation. It seems a
little easier to read without the extra backslash escapes.

[...]

The reply to Thomas' msg is not done.

Garrett
From: Asen Bozhilov on
Garrett Smith wrote:
> Asen Bozhilov wrote:
>
> > Asen Bozhilov wrote:
>
> >>          output = new Function('return Array(' + jsonStr + ');')();
>
> > Correction, should be:
>
> > output = new Function('return Array(null, ' + jsonStr + ');')();
>
> Why `null` as first element?

Because an `Array' constructor is overloaded and I will have problems
if I use first parameter. For example if there is:

JSON.parse('10');

I will get an array with `length' property equal to 10. I use an
`Array' constructor instead array literal, because if I use array
literal I will have problems with the following code:

JSON.parse('][');

Calling expression use braces which are allowed in JSON string only in
strings. So here cannot be exploited my code, because if there are
braces `invalidTokens' will catch them.

From: Garrett Smith on
On 2010-06-18 01:21 PM, Asen Bozhilov wrote:
> Garrett Smith wrote:
>> Asen Bozhilov wrote:
>>
>>> Asen Bozhilov wrote:
>>
>>>> output = new Function('return Array(' + jsonStr + ');')();
>>
>>> Correction, should be:
>>
>>> output = new Function('return Array(null, ' + jsonStr + ');')();
>>
>> Why `null` as first element?
>
> Because an `Array' constructor is overloaded and I will have problems
> if I use first parameter. For example if there is:
>
> JSON.parse('10');
>

RIght - I didn't think about it when I posted.

> I will get an array with `length' property equal to 10. I use an
> `Array' constructor instead array literal, because if I use array
> literal I will have problems with the following code:
>
> JSON.parse('][');
>
> Calling expression use braces which are allowed in JSON string only in
> strings. So here cannot be exploited my code, because if there are
> braces `invalidTokens' will catch them.
>
I'm going to take a look at that and write a test for both yours and
modified version of Doug's.


Garrett
From: Garrett Smith on
On 6/16/2010 8:15 AM, Thomas 'PointedEars' Lahn wrote:
> Garrett Smith wrote:
>
>> Thomas 'PointedEars' Lahn wrote:
>>> Garrett Smith wrote:
>>>> Thomas 'PointedEars' Lahn wrote:
>>>>> Garrett Smith wrote:
>>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>>> Garrett Smith wrote:
>>>>>>>> Meeting those goals, the result should be valuable and appreciated
>>>>>>>> by many.
>>>>>>> Which part of my suggestion did you not like?
>>>>>> Nothing, its fine but I did not see a regexp there that tests to see
>>>>>> if the string is valid JSON.
>>>>> There cannot be such a regular expression in ECMAScript as it does not
>>>>> support PCRE's recursive matches feature. An implementation of a
>>>>> push-down automaton, a parser, is required.
>>>> A parser would be too much for the FAQ.
>>> Probably, although I think it could be done in not too many lines for the
>>> purpose of validation.
>>
>> That would require more code to be downloaded and more processing to run
>> it. What about mobile devices?
>
> You are confused. What good is shorter code that is not a solution?
>
>>>> var isValidJSON = /^[\],:{}\s]*$/.
>>>> test(text.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, '@').
>>>> replace(
>>>> /"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g,
>>>> ']').
>>>> replace(/(?:^|:|,)(?:\s*\[)+/g, ''))
>>>
>>> Is this from json2.js? If yes, then it is not acceptable. To begin
>>> with, it does not regard "\"" valid JSON even though it is.
>>
>> The code is from json2.js:
>> http://www.json.org/json2.js
>
> Then it must be either summarily dismissed, or updated at least as follows:
>
> /"([^"\\]|\\.)*"|.../
>
> because *that* is the proper way to match a double-quoted string with
> optional escape sequences. Refined for JSON, it must be at least
>
> /"([^"\\^\x00-\x1F]|\\["\\\/bfnrt]|\\u[0-9A-Fa-f]{4})*"|.../
>

There is a problem; TAB character code is 9 and all implementations
allow it.

I see you went from unicode escape sequences to hex escapes, but why
\x00 and not \x0? Or why not just use a decimal escape \0?

The character range could be written more compactly; instead of
[0-9A-Fa-f], [0-9A-f], or even just [0-f], if that is not considered too
unreadable. Though similar in appearance, it could not be `[o-f]`
because that would be result in a SyntaxError, thrown by CharacterRange,
step 6. If i > j then throw a SyntaxError exception.

I haven't tested it, yet. More on testing below...


// Untested.
var jsonStringExp =
/"\t|(?:[^"\\^\0-\x1F]|\\["\\\/bfnrt]|\\u[0-f]{4})*"/,

// DecimalIntegerLiteral ::
// 0 NonZeroDigit DecimalDigitsopt
// JSONNumber:
// -opt DecimalIntegerLiteral JSONFractionopt ExponentPartopt
jsonNumberExp = /-?(0|[1-9]+)(?:\.\d+)?(?:[eE][+\-]\d+)?/,
jsonPrimitiveExp = new RegExp(
jsonStringExp.source
+ "|" + jsonNumberExp.source
+ "|true|false|null", "g"
);

var passExp = /^[\],:{}\s]*$/;

function isInvalidJson(text) {
var filtered = text.replace(jsonPrimitiveExp, ']')
.replace(/(?:^|:|,)(?:\s*\[)+/g, '')
.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g);

return passExp.test(filtered);
}

>> The character sequence "\"" is valid JSON value in ecmascript,
>
> That is gibberish. Either it is JSON, or it is ECMAScript.
>

Not gibberish.

ECMAScript defines JSON for ECMAScript. When I refer to JSON in
ECMAScript, I am referring to the JSON Grammar defined in ECMAScript 5.

JSON Grammar in ECMAScript 5 differs from the grammar specified in RFC
4627 in that it allows primitive values at top-level.

>> however in ecmascript, if enclosed in a single quote string, as - '"\""' -
>> the backslash would escape the double quote mark, resulting in '"""',
>> which is not valid JSON.
>
> You are confused.
>

I don't think so.

What you've allowed in your RegExp doesn't match JSON Grammar defined in
ECMA 5.

> "\"" is both an ES string literal and JSON for the string containing"
>

Yes.

> "\\\"" is both an ES string literal and JSON for the string containing \"
>

Yes.

> "\\"" is neither an ES string literal nor JSON.
>

Correct.

>> To pass a string value containing the character sequence "\"" to
>
> But that was not the purpose of the JSON string.
>

What do you mean by the purpose of the JSON String? The purpose is
inside the programmer.

>> JSON.parse, the backslash must be escaped. Thus, you would use:
>>
>> var quoteMarkInJSONString = '"\\""';
>
> Yes, but that is not how JSON is usually being put in. That is, the
> escaping backslash is _not_ escaped then, and the characters that
> quoteMarkInJSONString contains are
>
> \"
>

Right; a JSONValue is usually going to be supplied as a value of an
identifier, e.g. xhr.responseText. A string value having the character
sequence - "\"" - is valid JSON.

> and not
>
> "
>

That would be invalid.

> whereas only the latter was intended.
>
>> And that works.
>
> A JSON string *literal* may very well contain a literal backslash character,
> and it may also contain a literal double quote. The expression fails to
> recognize that.
>

No, a JSONString may not contain a literal backslash character.

| JSONString ::
| "JSONStringCharactersopt "
|
| JSONStringCharacters ::
| JSONStringCharacter JSONStringCharactersopt
|
| JSONStringCharacter ::
| SourceCharacter but not double-quote " or backslash \ or U+0000 thru
| U+001F \ JSONEscapeSequence

JSONStringCharacter may not contain backslash unless that is part of a
JSONEscapeSequence.

| JSONEscapeSequence ::
| JSONEscapeCharacter
| UnicodeEscapeSequence

[...]

>>
>> The goal of json2.js's JSON.parse is not to filter out values that are
>> valid; it is to eliminate values that are invalid. So far, it was
>> noticed to fail at that in four ways and I addressed those.
>
> You are very confused.
>

I don't think so. My understanding of the intent of json2.js comes from
code comments in it.

| // We split the second stage into 4 regexp operations in order to
| // work around crippling inefficiencies in IE's and Safari's regexp |
| // engines. First we replace the JSON backslash pairs with '@' (a
| // non-JSON character).

Replaces backslash chars with "@", which is excluded by /^[\],:{}\s]*$/

>>>>>> [...] The suggestion to use an object literal as the string to the
>>>>>> argument to JSON.parse is not any better than using "true".
>>>>>
>>>>> But it is. It places further requirements on the capabilities of the
>>>>> parser. An even better test would be a combination of all results of
>>>>> all productions of the JSON grammar.
>>>>
>>>> Cases that are known to be problematic can be filtered.
>>>
>>> Your point being?
>>
>> My point is that instead of trying every possible valid grammar check,
>> known bugs -- such as allowing 1. and +1 and 01, as seen in Spidermonkey
>> -- could be checked.
>
> The purpose of this was to provide a viable fallback for JSON.parse().
> Both your suggestion and the one in json2.js fail to do that.
>

No, I don't think it will be that difficult, but I want to get a test
suite for it first. Either with what I have now or with JsUnit.

The type of setup I am working on uses object literal notation for the
tests. I use Java style annotations, but in the test name.

For example:

APE.test.testSimple({
'test JSON.parse("1.") @throws SyntaxError' : function() {
JSON.parse("1.");
}
});

That test function would be expected to throw and in this case, the
thrown object would have a `name` property of exactly "SyntaxError".

Assertions use N-Unit style constraints. It's a bigger project than for
just this JSON test and the TestReporter is not done. IT renders the
tree as a UL but doesn't show any results (pass, fail, ignore, etc).

JsUnit would work fine for a JSON test and I may just use that instead.
After some sleep.

Garrett
From: Thomas 'PointedEars' Lahn on
Garrett Smith wrote:

> On 6/16/2010 8:15 AM, Thomas 'PointedEars' Lahn wrote:
>> Garrett Smith wrote:
>>> Thomas 'PointedEars' Lahn wrote:
>>>> Garrett Smith wrote:
>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>> Garrett Smith wrote:
>>>>>>> Thomas 'PointedEars' Lahn wrote:
>>>>>>>> Garrett Smith wrote:
>>>>>>>>> Meeting those goals, the result should be valuable and appreciated
>>>>>>>>> by many.
>>>>>>>> Which part of my suggestion did you not like?
>>>>>>> Nothing, its fine but I did not see a regexp there that tests to see
>>>>>>> if the string is valid JSON.
>>>>>> There cannot be such a regular expression in ECMAScript as it does
>>>>>> not
>>>>>> support PCRE's recursive matches feature. An implementation of a
>>>>>> push-down automaton, a parser, is required.
>>>>> A parser would be too much for the FAQ.
>>>> Probably, although I think it could be done in not too many lines for
>>>> the purpose of validation.
>>>
>>> That would require more code to be downloaded and more processing to run
>>> it. What about mobile devices?
>>
>> You are confused. What good is shorter code that is not a solution?

I take it that, since you have not answered this, you have come to see the
flaw in your logic here.

>>>>> var isValidJSON = /^[\],:{}\s]*$/.
>>>>> test(text.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, '@').
>>>>> replace(
>>>>> /"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g,
>>>>> ']').
>>>>> replace(/(?:^|:|,)(?:\s*\[)+/g, ''))
>>>>
>>>> Is this from json2.js? If yes, then it is not acceptable. To begin
>>>> with, it does not regard "\"" valid JSON even though it is.
>>>
>>> The code is from json2.js:
>>> http://www.json.org/json2.js
>>
>> Then it must be either summarily dismissed, or updated at least as
>> follows:
>>
>> /"([^"\\]|\\.)*"|.../
>>
>> because *that* is the proper way to match a double-quoted string with
>> optional escape sequences. Refined for JSON, it must be at least
>>
>> /"([^"\\^\x00-\x1F]|\\["\\\/bfnrt]|\\u[0-9A-Fa-f]{4})*"|.../
>
> There is a problem; TAB character code is 9 and all implementations
> allow it.

Then all implementations are wrong, or, put more politely, they implement
only something similar to JSON. The specification at json.org clearly says
that no control character is allowed in there. Control characters in the
Basic Latin Unicode block range from U+0000 to U+001F inclusive. If
anything, the character class is not exclusive enough, since there are
control characters beyond that block, from U+007F to U+009F inclusive (which
is easily fixed, though).

> I see you went from unicode escape sequences to hex escapes, but why
> \x00 and not \x0?

Because \x0 would be a syntax error, see ES3/5 7.8.4.

> Or why not just use a decimal escape \0?

That is a possibility I was not aware of [15.10.2.11], indeed, but then I
had two different kinds of character escape sequences in one character
class, with one not being recognizable as easily. No advantage, only
disadvantages there; so no, thanks.

> The character range could be written more compactly;

Yes, one could write /[\dA-Fa-f]/. Or even /…[\dA-F]…/i if there would be
no other letters in the remaining expression.

> instead of [0-9A-Fa-f], [0-9A-f], or even just [0-f], if that is not
> considered too unreadable.

Neither expression is equivalent to begin with.

>>> The character sequence "\"" is valid JSON value in ecmascript,
>> That is gibberish. Either it is JSON, or it is ECMAScript.
>
> Not gibberish.
>
> ECMAScript defines JSON for ECMAScript. When I refer to JSON in
> ECMAScript, I am referring to the JSON Grammar defined in ECMAScript 5.
>
> JSON Grammar in ECMAScript 5 differs from the grammar specified in RFC
> 4627 in that it allows primitive values at top-level.

Fair enough.

>>> however in ecmascript, if enclosed in a single quote string, as - '"\""'
>>> - the backslash would escape the double quote mark, resulting in '"""',
>>> which is not valid JSON.
>> You are confused.
> I don't think so.
>
> What you've allowed in your RegExp doesn't match JSON Grammar defined in
> ECMA 5.

Yes, it does.

>>> To pass a string value containing the character sequence "\"" to
>> But that was not the purpose of the JSON string.
>
> What do you mean by the purpose of the JSON String? The purpose is
> inside the programmer.

That programmer was me. I did not want to have a string containing `\"',
I wanted a string containing `"'. Whatever you tried to prove, it did not
relate to what I wanted to have, and so not what I wanted to have matched,
too, by the regular expression.

>>> JSON.parse, the backslash must be escaped. Thus, you would use:
>>>
>>> var quoteMarkInJSONString = '"\\""';
>>
>> Yes, but that is not how JSON is usually being put in. That is, the
>> escaping backslash is _not_ escaped then, and the characters that
>> quoteMarkInJSONString contains are
>>
>> \"
>>
>
> Right; a JSONValue is usually going to be supplied as a value of an
> identifier, e.g. xhr.responseText. A string value having the character
> sequence - "\"" - is valid JSON.
>
>> and not
>>
>> "
>
> That would be invalid.

Most certainly not, since "\"" is valid. I am talking about the literal
value here, after the expansion of escape sequences.

>> whereas only the latter was intended.
>>
>>> And that works.
>>
>> A JSON string *literal* may very well contain a literal backslash
>> character, and it may also contain a literal double quote. The
>> expression fails to recognize that.
>
> No, a JSONString may not contain a literal backslash character.

Yes, it may, as the "string" may contain escape sequences. So it is *wrong*
to disallow the literal backslash (`\\' in a RegExp literal) from the
content of the string, as that valid JSON escape sequence could then not be
matched, and the JSON text would be considered invalid when it is not.

> | JSONString ::
> | "JSONStringCharactersopt "
> |
> | JSONStringCharacters ::
> | JSONStringCharacter JSONStringCharactersopt
> |
> | JSONStringCharacter ::
> | SourceCharacter but not double-quote " or backslash \ or U+0000 thru
> | U+001F \ JSONEscapeSequence
>
> JSONStringCharacter may not contain backslash unless that is part of a
> JSONEscapeSequence.

I rest my case.


PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300dec7(a)news.demon.co.uk> (2004)