My Library TaskSpeed tests updated [JavaScript]

Prev: Presentation of a new native JavaScript database
Next: wordpress and modal commentary window

From: Richard Cornford on 16 Feb 2010 02:51

Scott Sauyet wrote:
>On Feb 15, 7:21 pm, Richard Cornford wrote:
>> Scott Sauyet wrote:
>>
>> <snip>> "bind" : function(){
>>> // connect onclick to every first child li of ever ul
>>>(suggested: "ul > li")
>> <snip> ^^^^^^^
>>
>> Surely that is not the CSS selector for "every first child li
>> of ever ul". The specs for these tests are really bad, so it
>> is not surprising that the implementations of the tests are
>> all over the place.
>
> And of course, if you do what's suggested by the text, you'll
> look wrong, because all the other libraries are taking the
> "ul > li" literally.

> I meant to mention this too in my last posting. There is
> definitely some room for improvement to the test specs as
> well as to the infrastructure.

I wouldn't argue with that.

>> > // return the length of the connected nodes
>>
>> You see, what is "the length of the connected nodes"? Is
>> that "length" in terms of something like the pixel
>> widths/height of the displayed nodes, or is "length" intended
>> to imply the length of some sort of 'collected' array of
>> nodes (i.e. some sort of 'query', and if so why are we timing
>> that in a context that is not a test of 'selector' performance),
>> or does the spec just call for *number* of nodes modified
>> by the task to be returned?
>
> This one does not bother me too much. In the context of the
> collection of tests [1], it's fairly clear that they want the
> number of nodes modified.

Is that all they want, rather than, say, in some cases the number
added/modified nodes as retrieved from the DOM post-modification (as a
verification that the process has occurred as expected)?

The various formulations used include:-

| return the result of the selector ul.fromcode

| return the length of the query "tr td"

| return the lenght of the odd found divs

| return the length of the destroyed nodes

(without any definition of what "the selector" or "the query" mean).

If you are saying that it is just the number of nodes that needs to be
returned then in, for example, the YUI3 "bind" example you cited:-

| "bind" : function(){
| Y.one('body').delegate('click', function() {}, 'ul > li');
| return Y.all('ul > li').size();
| },

- the last line could be written:-

return 845; //or whatever number is correct for the document

- and for that function (especially in a non-QSA environment) the bulk
of the work carried out in that function has been removed.

> Presumably they assume that the tools will collect
> the nodes in some array-like structure with a "length"
> property.

Maybe they do, but that is not what the spec is actually asking for. And
if it was what is being asked for, why should that process be included
in the timing for the tasks, as it is not really part of any realistic
task.

Richard.

From: Scott Sauyet on 16 Feb 2010 11:40

On Feb 16, 2:51 am, "Richard Cornford" <Rich...(a)litotes.demon.co.uk>
wrote:
> Scott Sauyet wrote:
> >On Feb 15, 7:21 pm, Richard Cornford wrote:
> >> Scott Sauyet wrote:
> >> > // return the length of the connected nodes
>
> >> You see, what is "the length of the connected nodes"? Is
> >> that "length" in terms of something like the pixel
> >> widths/height of the displayed nodes, or is "length" intended
> >> to imply the length of some sort of 'collected' array of
> >> nodes (i.e. some sort of 'query', and if so why are we timing
> >> that in a context that is not a test of 'selector' performance),
> >> or does the spec just call for *number* of nodes modified
> >> by the task to be returned?
>
> > This one does not bother me too much. In the context of the
> > collection of tests [1], it's fairly clear that they want the
> > number of nodes modified.
>
> Is that all they want, rather than, say, in some cases the number
> added/modified nodes as retrieved from the DOM post-modification (as a
> verification that the process has occurred as expected)?

I do think that in all of the tests, the result returned is supposed
to have to do with some count of elements after a certain
manipulation. But in some of them, such as bind, the manipulation
doesn't actually change the number of elements manipulated. As this
was based upon SlickSpeed, it inherits one of the main problems of
that system, namely that it tries to do one thing too many. It tries
to verify accuracy and compare speeds at the same time. This is
problematic, in my opinion.

> The various formulations used include:-
>
> | return the result of the selector ul.fromcode
>
> | return the length of the query "tr td"
>
> | return the lenght of the odd found divs
>
> | return the length of the destroyed nodes
>
> (without any definition of what "the selector" or "the query" mean).
>
> If you are saying that it is just the number of nodes that needs to be
> returned then in, for example, the YUI3 "bind" example you cited:-
>
> | "bind" : function(){
> | Y.one('body').delegate('click', function() {}, 'ul > li');
> | return Y.all('ul > li').size();
> | },
>
> - the last line could be written:-
>
> return 845; //or whatever number is correct for the document
>
> - and for that function (especially in a non-QSA environment) the bulk
> of the work carried out in that function has been removed.

I've actually thought of doing that, and loudly trumpeting that my
library is unbeatable at TaskSpeed! :-)

> > Presumably they assume that the tools will collect
> > the nodes in some array-like structure with a "length"
> > property.
>
> Maybe they do, but that is not what the spec is actually asking for. And
> if it was what is being asked for, why should that process be included
> in the timing for the tasks, as it is not really part of any realistic
> task.

Yes, if they want verification of counts, perhaps the test harness
itself could provide that.

-- Scott

From: Richard Cornford on 16 Feb 2010 12:23

On Feb 16, 4:40 pm, Scott Sauyet wrote:
> On Feb 16, 2:51 am, Richard Cornford wrote:
>> Scott Sauyet wrote:
<snip>
>>> This one does not bother me too much. In the context of the
>>> collection of tests [1], it's fairly clear that they want the
>>> number of nodes modified.
>
>> Is that all they want, rather than, say, in some cases the number
>> added/modified nodes as retrieved from the DOM post-modification
>> (as a verification that the process has occurred as expected)?
>
> I do think that in all of the tests, the result returned is
> supposed to have to do with some count of elements after a
> certain manipulation.

Which is not a very realistic test 'task' as the number of nodes
modified in operations on real DOMs is seldom of any actual interest.

> But in some of them, such as bind, the manipulation
> doesn't actually change the number of elements manipulated.

And for others, such as the list creation, the number of 'modified'
nodes is pre-determined by the number you have just created.

> As this was based upon SlickSpeed, it inherits one of the main
> problems of that system, namely that it tries to do one thing
> too many.

That is exactly where I attribute the cause of this flaw.

> It tries to verify accuracy and compare speeds at the same
> time. This is problematic, in my opinion.

Especially when it is timing the verification process with the task,
and applying different verification code to each 'library'. There you
have the potential for miscounting library code to combine with
misbehaving DOM modification code to give the impression that the
whole thing is working correctly, or for a correctly carried out task
to be labelled as failing because the counting process is off form
some reason.

<snip>
>> return 845; //or whatever number is correct for the document
>
>> - and for that function (especially in a non-QSA environment)
>> the bulk of the work carried out in that function has been
>> removed.
>
> I've actually thought of doing that, and loudly trumpeting
> that my library is unbeatable at TaskSpeed! :-)

It wouldn't do the pure DOM code any harm either.

>>> Presumably they assume that the tools will collect
>>> the nodes in some array-like structure with a "length"
>>> property.
>
>> Maybe they do, but that is not what the spec is actually asking
>> for. And if it was what is being asked for, why should that
>> process be included in the timing for the tasks, as it is not
>> really part of any realistic task.
>
> Yes, if they want verification of counts, perhaps the test
> harness itself could provide that.

Not just "perhaps". It should, and it should use the same verification
code for each test, and outside of any timing recording.

Richard.

From: Scott Sauyet on 16 Feb 2010 15:57

On Feb 16, 12:23 pm, Richard Cornford <Rich...(a)litotes.demon.co.uk>
wrote:
> On Feb 16, 4:40 pm, Scott Sauyet wrote:
>
>> On Feb 16, 2:51 am, Richard Cornford wrote:
>>> Scott Sauyet wrote:
>>>> Presumably they assume that the tools will collect
>>>> the nodes in some array-like structure with a "length"
>>>> property.
>
>>> Maybe they do, but that is not what the spec is actually asking
>>> for. And if it was what is being asked for, why should that
> >> process be included in the timing for the tasks, as it is not
>>> really part of any realistic task.
>
>> Yes, if they want verification of counts, perhaps the test
>> harness itself could provide that.
>
> Not just "perhaps". It should, and it should use the same verification
> code for each test, and outside of any timing recording.

I think that testing the selector engine is part of testing the
library. Although this is not the same as the SlickSpeed selectors
test, it should subsume that one. So I don't object to testing
selector speed. The verification, though, is a different story. It's
quite easy to switch testing documents, but it is presumably not so
easy to verify all the results of all the manipulations. The
compromise that TaskSpeed inherits from SlickSpeed is, I think, fairly
reasonable. Make all the libraries report their results, and note if
there is any disagreement. They could, of course, all be wrong and
yet all have the same values, but that seems relatively unlikely.

There is an approach that I doubt I'd bother trying, but which is
quite interesting: Add a url query parameter, which would serve as a
seed for a randomizing function. If the server does not get one, it
chooses a random value and redirects to a page with that random seed.
Then, based upon random numbers derived from that seed, a document is
generated with some flexible structure, and a test script is generated
that runs a random some sequence of the predefined test cases against
each library. Verification might be tricky, but should be doable.
This might make it more difficult for libraries to design their tests
around the particulars of the document and/or the ordering of the
tests. While I think this would work, it sounds like more effort than
I'm willing to put in right now.

-- Scott

From: Richard Cornford on 17 Feb 2010 11:04

On Feb 16, 8:57 pm, Scott Sauyet wrote:
> On Feb 16, 12:23 pm, Richard Cornford wrote:
>> On Feb 16, 4:40 pm, Scott Sauyet wrote:
>>> On Feb 16, 2:51 am, Richard Cornford wrote:
>>>> Scott Sauyet wrote:
>>>>> Presumably they assume that the tools will collect
>>>>> the nodes in some array-like structure with a "length"
>>>>> property.
>
>>>> Maybe they do, but that is not what the spec is actually
>>>> asking for. And if it was what is being asked for, why
>>>> should that process be included in the timing for the
>>>> tasks, as it is not really part of any realistic task.
>
>>> Yes, if they want verification of counts, perhaps the test
>>> harness itself could provide that.
>
>> Not just "perhaps". It should, and it should use the same
>> verification code for each test, and outside of any
>> timing recording.
>
> I think that testing the selector engine is part of testing
> the library.

Obviously it is, if the 'library' has a selector engine, but that is a
separate activity from testing the library's ability to carry out
tasks as real world tasks don't necessitate any selector engine.
(Remember that common hardware and browser performance was not
sufficient for any sort of selector engine even to look like a viable
idea before about the middle of 2005, but (even quite extreme) DOM
manipulation was long established by that time.)

The 'pure DOM' tests, as a baseline for comparison, don't necessarily
need a selector engine to perform any given task (beyond the fact that
the tasks themselves have been designed around a notion of
'selectors'). So making selector engine testing part of the 'task'
tests acts to impose arbitrary restrictions on the possible code used,
biases the results, and ultimately negates the significance of the
entire exercise.

> Although this is not the same as the SlickSpeed
> selectors test,

Comparing the selector engines in libraries that have selector engines
seems like a fairly reasonable thing to do. Suggesting that a selector
engine is an inevitable prerequisite for carrying out DOM manipulation
tasks is self evident BS.

> it should subsume that one. So I don't object
> to testing selector speed. The verification, though, is a
> different story. It's quite easy to switch testing documents,
> but it is presumably not so easy to verify all the results of
> all the manipulations.

Why not (at least in most cases)? code could be written to record the
changes to a DOM that resulted from running a test function. You know
what you expect the test function to do so verifying that it did do it
shouldn't be too hard.

Granted there are cases like the use of - addEventListener - where
positive verification becomes a lot more difficult, but as it is the
existing tests aren't actually verifying that listeners were added.

> The compromise that TaskSpeed inherits
> from SlickSpeed is, I think, fairly reasonable.

I don't. TaskSpeed's validity is compromised in the process.

> Make all the libraries report their results, and note
> if there is any disagreement.

But reporting result is not part of any genuinely representative task,
and so it should not be timed along with any given task. The task
itself should be timed in isolation, and any verification employed
separately.

Whether some 'library' should be allowed to do its own verification is
another matter, but the verification definitely should not be timed
along with the task that it is attempting to verify.

> They could, of course, all be wrong and
> yet all have the same values, but that seems
> relatively unlikely.

Unlikely, but not impossible, and an issue that can easily be entirely
avoided.

> There is an approach that I doubt I'd bother trying, but
> which is quite interesting: Add a url query parameter,
> which would serve as a seed for a randomizing function.
> If the server does not get one, it chooses a random value
> and redirects to a page with that random seed. Then, based
> upon random numbers derived from that seed, a document is
> generated with some flexible structure, and a test script
> is generated that runs a random some sequence of the
> predefined test cases against each library.

I can see how this might make sense in selector speed testing (though
presumably you would run up against many cases where the reported
duration of the test would be zero millisecond, despite our knowing
that nothing happens in zero time) but for task testing randomly
generating the document acted upon would be totally the wrong
approach. If you did that you would bias against the baseline pure DOM
tests as then they would have to handle issues arising from the
general case, which are not issues inherent in DOM scripting because
websites are not randomly generated.

In any real web site/web application employment of scripting,
somewhere between something and everything is known about the
documents that are being scripted. Thus DOM scripts do not need to
deal with general issues in browser scripting, but rather only need to
deal with the issues that are known to exist in their specific
context.

In contrast, it is an inherent problem in general purpose library code
that they must address (or attempt to address) all the issues that
occur in a wide range of context (at minimum, all the common
contexts). There are inevitably overheads in doing this, with those
overheads increasing as the number of contexts accommodated increases.

With random documents and comparing libraries against some supposed
'pure DOM' baseline, you will be burdening the baseline with the
overheads that are only inherent in general purpose code. The result
would not be a representative comparison.

> Verification might be tricky, but should be doable.
> This might make it more difficult for libraries to
> design their tests around the particulars of the
> document and/or the ordering of the tests. While
> I think this would work, it sounds like more
> effort than I'm willing to put in right now.

Given that javascript source is available if anyone want to look at
it, any library author attempting to optimise for a specific test
(rather than, say, optimising for a common case) is likely to be
spotted doing so, and see their reputation suffer as a result.

Richard.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: Presentation of a new native JavaScript database
Next: wordpress and modal commentary window