My Library TaskSpeed tests updated [JavaScript]

Prev: Presentation of a new native JavaScript database
Next: wordpress and modal commentary window

From: Scott Sauyet on 17 Feb 2010 12:54

On Feb 17, 11:04 am, Richard Cornford wrote:
> On Feb 16, 8:57 pm, Scott Sauyet wrote:

>> I think that testing the selector engine is part of testing
>> the library.
>
> Obviously it is, if the 'library' has a selector engine, but that is a
> separate activity from testing the library's ability to carry out
> tasks as real world tasks don't necessitate any selector engine.

Perhaps it's only because the test framework was built testing against
libraries that had both DOM manipulation and selector engines, but
these seem a natural fit. I don't believe this was meant to be a DOM
manipulation test in particular. My understanding (and I was not
involved in any of the original design, so take this with a grain of
salt) is that this was meant to be a more general test of how the
libraries were used, which involved DOM manipulation and selector-
based querying. If it seemed at all feasible, the framework would
probably have included event handler manipulation tests, as well. If
the libraries had all offered classical OO infrastructures the way
MooTools and Prototype do, that would probably also be tested.

Why the scare quotes around "library"? Is there a better term --
"toolkit"? -- that describes the systems being tested?

> (Remember that common hardware and browser performance was not
> sufficient for any sort of selector engine even to look like a viable
> idea before about the middle of 2005, but (even quite extreme) DOM
> manipulation was long established by that time.)

Really? Very interesting. I didn't realize that it was a system
performance issue. I just thought it was a new way of doing things
that people started trying around then.

> The 'pure DOM' tests, as a baseline for comparison, don't necessarily
> need a selector engine to perform any given task (beyond the fact that
> the tasks themselves have been designed around a notion of
> 'selectors'). So making selector engine testing part of the 'task'
> tests acts to impose arbitrary restrictions on the possible code used,

Absolutely. A pure selector engine would also not be testable, nor
would a drag-and-drop toolkit. We are restricted to systems that can
manipulate the DOM and find the size of certain collections of
elements.

> biases the results,

In what way?

> and ultimately negates the significance of the entire exercise.

I just don't see it. There is clearly much room for improvement, but
I think the tests as they stand have significant value.

>> Although this is not the same as the SlickSpeed
>> selectors test,
>
> Comparing the selector engines in libraries that have selector engines
> seems like a fairly reasonable thing to do. Suggesting that a selector
> engine is an inevitable prerequisite for carrying out DOM manipulation
> tasks is self evident BS.

Note that these results don't require that the library actually use a
CSS-style selector engine, only that it can for instance find the
number of elements of a certain type, the set of which if often most
easily described via a CSS selector. When the "table" function is
defined to return "the length of the query 'tr td'," we can interpret
that as counting the results of running the selector "tr td" in the
context of the document if we have a selector engine, but as "the
number of distinct TD elements in the document which descend from TR
elements" if not. Being able to find such elements has been an
important part of most of the DOM manipulation I've done.

PureDOM does all this without any particular CSS selector engine, so
it's clear that one is not required to pass the tests.

>> it should subsume that one. So I don't object
>> to testing selector speed. The verification, though, is a
>> different story. It's quite easy to switch testing documents,
>> but it is presumably not so easy to verify all the results of
>> all the manipulations.
>
> Why not (at least in most cases)? code could be written to record the
> changes to a DOM that resulted from running a test function. You know
> what you expect the test function to do so verifying that it did do it
> shouldn't be too hard.

The document to test has been fairly static, and I suppose one could
go through it, analyzing its structure, and calculating the expected
results. But the document is included as a stand-alone file, used
with this PHP:

<?php include('../template.html');?>

Another file could easily be substituted, and it might well be
worthwhile doing. Adding this sort of analysis would make it much
more time-consuming to test against a different document.

> Granted there are cases like the use of - addEventListener - where
> positive verification becomes a lot more difficult, but as it is the
> existing tests aren't actually verifying that listeners were added.

Are there any good techniques you know of that would make it
straightforward to actually test this from within the browser's script
engine? It would be great to be able to test this.

>> The compromise that TaskSpeed inherits
>> from SlickSpeed is, I think, fairly reasonable.
>
> I don't. TaskSpeed's validity is compromised in the process.
>
>> Make all the libraries report their results, and note
>> if there is any disagreement.
>
> But reporting result is not part of any genuinely representative task,
> and so it should not be timed along with any given task. The task
> itself should be timed in isolation, and any verification employed
> separately. [ ... ]

I think this critique is valid only if you assume that the
infrastructure is designed only to test DOM Manipulation. I don't buy
that assumption.

>> They could, of course, all be wrong and
>> yet all have the same values, but that seems
>> relatively unlikely.
>
> Unlikely, but not impossible, and an issue that can easily be entirely
> avoided.

Easily for a single document, and even then only with some real work
in finding the expected results and devising a way to test them.

>> There is an approach that I doubt I'd bother trying, but
>> which is quite interesting: Add a url query parameter,
>> which would serve as a seed for a randomizing function.
>> If the server does not get one, it chooses a random value
>> and redirects to a page with that random seed. Then, based
>> upon random numbers derived from that seed, a document is
>> generated with some flexible structure, and a test script
>> is generated that runs a random some sequence of the
>> predefined test cases against each library.
>
> I can see how this might make sense in selector speed testing (though
> presumably you would run up against many cases where the reported
> duration of the test would be zero millisecond, despite our knowing
> that nothing happens in zero time)

In another thread [1], I discuss an updated version of slickspeed,
which counts repeated tests over a 250ms span to more accurately time
the selectors.

> but for task testing randomly
> generating the document acted upon would be totally the wrong
> approach. If you did that you would bias against the baseline pure DOM
> tests as then they would have to handle issues arising from the
> general case, which are not issues inherent in DOM scripting because
> websites are not randomly generated.

I was not expecting entirely random documents. Instead, I would
expect to generate one in which the supplied tests generally have
meaningful results. So for this test

"attr" : function(){
// find all ul elements in the page.
// generate an array of their id's
// return the length of that array
},

I might want to randomly determine the level of nesting at which ULs
appear, randomly determine how many are included in the document, and
perhaps randomly choose whether some of them do not actually have
ids. There would probably be some small chance that there were no ULs
at all.

> In any real web site/web application employment of scripting,
> somewhere between something and everything is known about the
> documents that are being scripted. Thus DOM scripts do not need to
> deal with general issues in browser scripting, but rather only need to
> deal with the issues that are known to exist in their specific
> context.

Absolutely. I definitely wouldn't try to build entirely random
documents, only documents for which the results of the tests should be
meaningful. The reason I said I probably wouldn't do this is that,
while it is by no means impossible, it is also a far from trivial
exercise.

> In contrast, it is an inherent problem in general purpose library code
> that they must address (or attempt to address) all the issues that
> occur in a wide range of context (at minimum, all the common
> contexts). There are inevitably overheads in doing this, with those
> overheads increasing as the number of contexts accommodated increases.

Yes, this is true. But it is precisely these general purpose
libraries that are under comparison in these tests. Being able to
compare their performance and the code each one uses are the only
reason these tests exist.

> [ ... ]
>> Verification might be tricky, but should be doable.
>> This might make it more difficult for libraries to
>> design their tests around the particulars of the
>> document and/or the ordering of the tests. While
>> I think this would work, it sounds like more
>> effort than I'm willing to put in right now.
>
> Given that javascript source is available if anyone want to look at
> it, any library author attempting to optimise for a specific test
> (rather than, say, optimising for a common case) is likely to be
> spotted doing so, and see their reputation suffer as a result.

I would hope so, but as I said in the post to which you initially
responded, I see a fair bit of what could reasonably be considered
optomising for the test, and I only really looked at jQuery's, YUI's,
and My Library's test code. I wouldn't be surprised to find more in
the others.

-- Scott
____________________
[1] http://groups.google.com/group/comp.lang.javascript/msg/f333d40588ae2ff0

From: David Mark on 18 Feb 2010 12:55

Andrew Poulos wrote:
> On 15/02/2010 3:45 PM, David Mark wrote:
>> I ran it a few times. This is representative. The two versions flip
>> flop randomly. Usually around a third of the purer tests. :)
>
> I'm not sure that if one person can write a library that's as good or
> better than libraries on which (I believe) teams of people have worked
> says a lot about one person's ability or not much about the others.

Thanks. But, as mentioned, I am not the only one who could do this.
The basic theory that has been put forth for years is that those who
really know cross-browser scripting refuse to work on GP libraries
because they break the first three rules of cross-browser scripting
(context, context, context). The three rules bit is mine, but the basic
theory about context-specific scripts has been put forth by many others.

>
> I'm not meaning to be offensive, I'm just wondering how one person can
> appear to achieve so much.
>

Thanks! It's because this stuff is not that complicated. If groups of
developers spent years fumbling and bumbling their way through basic
tasks (e.g. using browser sniffing for everything and still failing),
then it wouldn't be too hard to show them up. And it wasn't. Took
about a month to clean up what was originally a two-month project. I
think it is really shaping up as a world-beater now. :)

And all without browser sniffing. Who could have predicted such a
thing? Lots of people, that's who. ;)

From: David Mark on 18 Feb 2010 13:01

Michael Wojcik wrote:
> Andrew Poulos wrote:
>> I'm not meaning to be offensive, I'm just wondering how one person can
>> appear to achieve so much.
>
> Surely if studies of software development have taught us anything,
> they've taught us that there is no correlation between the size of a
> team and the quality of the code.
>
> I've been a professional developer for nearly a quarter of a century,
> and I'm not surprised at all when one person delivers a better
> solution than what a large team delivers. Even if the large team's
> total effort is much greater - which may or may not be the case here.

I think I agree with all of this, but was confused by this statement.
What may or may not be the case? I'm one guy who took a two-year hiatus
from the library. Meanwhile hundreds (if not thousands) of people have
been filing tickets, arguing patches, actually applying come patches,
un-applying patches, maintaining repositories, testing browsers
(somewhat ineffectually), arguing about blog comments, etc. Make no
mistake that I did none of that. It's been me and Notepad and a few
weekends and evenings over the last month. Thanks to the handful of
people who gave feedback too. :)

From: David Mark on 18 Feb 2010 13:14

Scott Sauyet wrote:
> On Feb 14, 11:45 pm, David Mark <dmark.cins...(a)gmail.com> wrote:
>> David Mark wrote:
>>> I've updated the TaskSpeed test functions to improve performance. This
>>> necessitated some minor additions (and one change) to the OO interface
>>> as well. I am pretty happy with the interface at this point, so will
>>> set about properly documenting it in the near future.
>
> So you've learned that test-driven development is not an
> oxymoron? :-)

I think you are misquoting me. The term is "test-driven" design (a la
John Resig). The quotes indicate that he isn't designing anything but
treating empirical observations like they are specifications. It's the
crystal ball approach. Seatch the archive for "test swarm".

>
> I actually am a fan of test-driven design, but I don't do it with
> performance tests; that scares me.

I am sure you are _not_ talking about what I am talking about. At least
I hope not. And what makes you think that these performance tests had
anything to do with the changes. FYI, they didn't. It was Richard's
solid suggestion to use cloneNode as there are no event listeners or
custom attributes to deal with in these test functions. I knew it would
be faster _before_ I re-ran the tests. That's the difference. I don't
take test results at face value. You have to understand what you are
looking at before you can react to them.

In contrast, I see the various "major" efforts resorting to all sorts of
unexplicable voodoo based solely on "proof" provided by test results
with no understanding at all going into the process. That's what is
wrong with "test-driven" design/development.

And as for the GP clone method that I added, it will be stipulated that
listeners (and custom attributes if you are into those) _must_ be added
_after_ cloning. That takes care of that. ;)

>
>>> [http://www.cinsoft.net/taskspeed.html]
>
>> Opera 10.10, Windows XP on a very busy and older PC:-
>>
>> 2121 18624 9000 5172 22248 4846 4360 1109 1266 1189
>> 6140 1876 843* 798*
>>
>> I ran it a few times. This is representative. The two versions flip
>> flop randomly. Usually around a third of the purer tests. :)
>
> I can confirm similar rankings (with generally faster speeds, of
> course) on my modern machine in most recent browsers, with two
> exceptions: First, in Firefox and IE, PureDOM was faster than My
> Library. Second, in IE6, several tests fail in My Library ("setHTML",
> "insertBefore", "insertAfter".)

Well, that's no good at all. :) Likely a recent development. I will
look into that. If you have specifics, that would be helpful as I don't
have IE6 handy right this minute.

> Also note that the flip-flopping of
> the two versions might have to do with the fact that they are pointing
> at the same exact version of My Library (the one with QSA) and the
> same test code. You're running the same infrastructure twice! :-)

Huh? One is supposed to be pointing to the non-QSA version. I'll fix that.

>
> This is great work, David. I'm very impressed.

Thanks! I'll respond to the rest after I track down this IE6 problem.
I can't believe I broke IE6 (of all things). That's what I get for not
re-testing. :(

From: Richard Cornford on 18 Feb 2010 13:34

On Feb 18, 6:14 pm, David Mark wrote:
> Scott Sauyet wrote:
<snip>
>> I actually am a fan of test-driven design, but I don't do it
>> with performance tests; that scares me.
>
> I am sure you are _not_ talking about what I am talking about.
<snip>
> ... . That's the difference. I don't
> take test results at face value. You have to understand what you
> are looking at before you can react to them.
>
> In contrast, I see the various "major" efforts resorting to all
> sorts of unexplicable voodoo based solely on "proof" provided
> by test results with no understanding at all going into the
> process. That's what is wrong with "test-driven"
> design/development.

Another significant issue (beyond understanding the results) is the
question of designing the right test(s) to apply. Get the test design
wrong and the results will be meaningless, so not understanding them
isn't making anything worse.

To illustrate; the conclusions drawn on this page:-

<URL: http://ejohn.org/blog/most-bizarre-ie-quirk/ >

- are totally incorrect because the test used (predictably) interfered
with the process that was being examined.

(So is the next step going to be "test-driven test design"? ;-)

Richard.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: Presentation of a new native JavaScript database
Next: wordpress and modal commentary window