From: Lasse Reichstein Nielsen on
"Richard Cornford" <Richard(a)litotes.demon.co.uk> writes:

> It turned out that
> charCodeAt was not useful as the application of the bitwise operations
> soon turns all the string digits into the corresponding numbers.

From a performance standpoint, I think charCodeAt can be significantly
faster than charAt + (effectively) parseInt. Reading the character
code at a string position returns a number (easily, reading from the
internal representation of the string), whereas charAt returns a
singleton string (which at least can be cached) that requires parsing
to turn into a number.

I'm guessing the fastest approach would be
str.charCodeAt(i) & 0x0F

/L
--
Lasse Reichstein Holst Nielsen
'Javascript frameworks is a disruptive technology'

From: Richard Cornford on
Lasse Reichstein Nielsen wrote:
> Richard Cornford writes:
>
>> It turned out that charCodeAt was not useful as the application
>> of the bitwise operations soon turns all the string digits into
>> the corresponding numbers.
>
> From a performance standpoint, I think charCodeAt can be
> significantly faster than charAt + (effectively) parseInt.

Comparing those two, yes.

> Reading the character code at a string position returns a number
> (easily, reading from the internal representation of the string),
> whereas charAt returns a singleton string (which at least can
> be cached) that requires parsing to turn into a number.
>
> I'm guessing the fastest approach would be
> str.charCodeAt(i) & 0x0F

The other factor is looping over the characters in the string to build
the array of numbers. That is necessary for both of - charCodeAt - and -
charCode -, but with - split - the implied loop is handled internally by
native code.

Richard.

From: RobG on
On Jul 5, 6:12 pm, "Richard Cornford" <Rich...(a)litotes.demon.co.uk>
wrote:
> Lasse Reichstein Nielsen wrote:
> > Richard Cornford writes:
>
> >> It turned out that charCodeAt was not useful as the application
> >> of the bitwise operations soon turns all the string digits into
> >> the corresponding numbers.
>
> > From a performance standpoint, I think charCodeAt can be
> > significantly faster than charAt + (effectively) parseInt.
>
> Comparing those two, yes.
>
> > Reading the character code at a string position returns a number
> > (easily, reading from the internal representation of the string),
> > whereas charAt returns a singleton string (which at least can
> > be cached) that requires parsing to turn into a number.
>
> > I'm guessing the fastest approach would be
> > str.charCodeAt(i) & 0x0F
>
> The other factor is looping over the characters in the string to build
> the array of numbers. That is necessary for both of - charCodeAt - and -
> charCode -, but with - split - the implied loop is handled internally by
> native code.


While reading characters from a string might be faster for that
particular operation, replacing other array operations with their
string equivalents means that overall, the program runs slower than
its equivalent using an array. Below is a small test script, there may
be other optimisations to the string version. Replacing substring and
charAt operations with equivalent regular expression operations has a
negligable affect on performance.

As far as I can see, having to create a new string several times on
each loop negates any advantage of charAt or charCodeAt for getting a
digit. Browsers that have slow string manipulation are particularly
affected.

In the test code, I re-use chunks of previously generated strings to
speed things up a bit.


// toBinary using string and charCodeAt
function toBin_swlrn(numStr) {
var bigInt = '',
len = numStr.length,
result = '',
re = /^0/,
rem, digit, i;
do {
for (rem = i = 0; i < len; ++i) {
digit = (numStr.charCodeAt(i) & 0x0F) + rem * 10;
rem = digit & 1;
bigInt += (digit - rem) / 2;
}
if (bigInt.charAt(0) == '0') {
bigInt = bigInt.replace(re,'');
--len;
}
result += rem;
numStr = bigInt;
bigInt = '';
} while (len);
return result.split('').reverse().join("");
}

// toBinary using array
function toBin_sw(numStr) {
var bigInt = numStr.split(""),
len = bigInt.length,
result = [],
rem, digit, i;
do {
for (rem = i = 0; i < len; ++i) {
digit = +bigInt[i] + rem * 10;
rem = digit & 1;
bigInt[i] = (digit - rem) / 2; // or (digit / 2) | 0
}
if (!bigInt[0]) {
bigInt.shift();
--len;
}
result.push(rem);
} while (len);
return result.reverse().join("");
}

function speedTest() {
var n = 50, // Number of integers to generate
m = 50, // Length of each integer
numArray = [], // Array of integer strings to process
result = [], // Test results
r = '', // Random number string
s, f, fn,
toRun = ['toBin_swlrn', 'toBin_sw'];

// Generate an array of n integers of m digits each
do {
while (r.length < m) {
r += '' + Math.random()*1e8|0;
}
r = r.substring(0,m)
numArray.push(r);
--n;
// Reverse and reuse
numArray.push(r.split('').reverse().join(''));
--n;
// Reuse middle bit
r = r.substring((m/4|0), (3*m/4|0));
--n;
} while (n>0)

// Run tests
for (var i=0, iLen=toRun.length; i<iLen; i++) {
fn = window[toRun[i]];
s = new Date();
for (var j=0, jLen=numArray.length; j<jLen; j++) {
fn(numArray[j]);
}
f = new Date();
result.push(toRun[i] + ': ' + (f-s));
}
result.push(numArray[0])

return result;
}

document.write( speedTest().join('<br>') );


--
Rob
From: Stefan Weiss on
On 06/07/10 01:54, RobG wrote:
> On Jul 5, 6:12 pm, "Richard Cornford" <Rich...(a)litotes.demon.co.uk>
> wrote:
>> Lasse Reichstein Nielsen wrote:
>> > I'm guessing the fastest approach would be
>> > str.charCodeAt(i) & 0x0F
>>
>> The other factor is looping over the characters in the string to build
>> the array of numbers. That is necessary for both of - charCodeAt - and -
>> charCode -, but with - split - the implied loop is handled internally by
>> native code.
>
> While reading characters from a string might be faster for that
> particular operation, replacing other array operations with their
> string equivalents means that overall, the program runs slower than
> its equivalent using an array.

I've noticed the same thing. At least with the engine I'm using,
charAt() and charCodeAt() are both about 30-40 times slower than
splitting the string and iterating over the array elements (I tested
this with just the loops, no conversions or return values). That's a
much bigger performance hit than all of the previously mentioned
optimizations together.

> As far as I can see, having to create a new string several times on
> each loop negates any advantage of charAt or charCodeAt for getting a
> digit. Browsers that have slow string manipulation are particularly
> affected.

It's not just the concatenation; reading the digits itself is slower
with the string methods. Maybe it's the method calls, or the fact that
the engine has to start counting characters from 0 every time, while
property access in an array is likely optimized. I don't know.
I was hoping it would be faster, or at least not much slower. AFAIK,
strings use up less memory than arrays (in JavaScript). It probably
won't matter much for the toBin() example, but it might present a
problem for a more generalized arbitrary precision math library.

I made a few optimizations to the previous version I posted:

- I replaced the result array with a string. This turned out to be
faster with SpiderMonkey, but it might be slower in other engines (maybe
in JScript, IIRC), or if the input strings are _very_ long.

- I eliminated the unnecessary shift() operation on the bigInt array,
so that the array length remains constant now.

- Richard's suggestion about using the right shift operator instead of
dividing by 2 was a big improvement. I actually had this in an earlier
test version of the function, but it got lost along the way. Oh, and >>>
appears to be slightly faster than >>, but that could very well be an
artifact of the benchmark.
I couldn't find a way to try out the "substract 3" suggestion without
introducing at least one if-statement.


function toBin (numStr) {
var bigInt = numStr.split(""),
len = bigInt.length,
result = "",
offset = 0,
rem, divPart, i;
do {
for (rem = 0, i = offset; i < len; ++i) {
divPart = +bigInt[i] + rem * 10;
rem = divPart & 1;
bigInt[i] = divPart >>> 1;
}
if (!bigInt[offset]) {
++offset;
}
result = rem + result;
} while (offset < len);
return result;
}


With my test input strings, this one is about 70-80% faster than the
previous version.


--
stefan
From: Richard Cornford on
On Jul 6, 3:29 am, Stefan Weiss wrote:
> On 06/07/10 01:54, RobG wrote:
>> On Jul 5, 6:12 pm, Richard Cornford wrote:
>>> Lasse Reichstein Nielsen wrote:
>>>> I'm guessing the fastest approach would be
>>>> str.charCodeAt(i) & 0x0F
>
>>> The other factor is looping over the characters in the string
>>> to build the array of numbers. That is necessary for both of
>>> - charCodeAt - and - charCode -, but with - split - the implied
>>> loop is handled internally by native code.
>
>> While reading characters from a string might be faster for that
>> particular operation, replacing other array operations with their
>> string equivalents means that overall, the program runs slower
>> than its equivalent using an array.
>
> I've noticed the same thing. At least with the engine I'm using,
> charAt() and charCodeAt() are both about 30-40 times slower than
> splitting the string and iterating over the array elements (I
> tested this with just the loops, no conversions or return values).

Is that a reasonable comparison? - charAr - gets you what - split -
does; an array of string, but charCodeAt gives you an array of
numbers, possibly avoiding a later type-converting step.

> That's a much bigger performance hit than all of the previously
> mentioned optimizations together.

In the past it has been shown that attempting to avoid the implied
object creation when calling a method on a string primitive by first
turning that primitive into its corresponding object (so - x = new
String(x) -, and calling the methods on the resulting object) is not
nearly the optimisation that ECMA 262 suggests it might be. I is
proposed that calling methods on string primitives is so common that
the engine already does a great deal to optimise that type of action.
On the other hand, calling - carAt - and/or - charCodeAt - on each
character in a long string might be influenced by first turning the
string into an object (probably at least wroth a test).

<snip>
> I couldn't find a way to try out the "substract 3" suggestion
> without introducing at least one if-statement.
<snip>

What is wrong with the - if - statement?

Richard.