From: Johannes Baagoe on
Johannes Baagoe :

> No: "Où qu'il réside".charCodeAt(1) == 65533

Oops, sorry - that was with Rhino in a UTF-8 console. V8 says 249, which
makes more sense.

--
Johannes
From: Johannes Baagoe on
nick :

> Hmm... so I wonder how this passes the "Où qu'il réside" test?

I think I figured it out, after all.

> Were all of those char codes <= 256?

Yes. Any character in the Latin-1 Supplement is represented by a number
between 0x0080 and 0x00FF in UTF-16, which is what javascript uses.

So, for French (except "œ" and "Œ"), Spanish, German, Portuguese, Danish
and a few others, you should be all right. But it won't work with Greek,
Russian or Chinese, and certainly not with Egyptian hieroglyphs which
require *two* 16-bit char codes.

--
Johannes
From: Thomas 'PointedEars' Lahn on
Johannes Baagoe wrote:

> nick :
>> Were all of those char codes <= 256?
>
> Yes. Any character in the Latin-1 Supplement is represented by a number
> between 0x0080 and 0x00FF in UTF-16,

No, in Unicode.

> which is what javascript uses.

| A conforming [ECMAScript] implementation [...] shall interpret characters
| in conformance with the Unicode Standard, Version 3.0 or later and
| ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding form,
| implementation level 3. If the adopted ISO/IEC 10646-1 subset is not
| otherwise specified, it is presumed to be the BMP subset, collection 300.
| If the adopted encoding form is not otherwise specified, it presumed to be
| the UTF-16 encoding form.

Learn the difference between character set and encoding.

> So, for French (except "œ" and "Œ"), Spanish, German, Portuguese, Danish
> and a few others, you should be all right. But it won't work with Greek,
> Russian or Chinese, and certainly not with Egyptian hieroglyphs which
> require *two* 16-bit char codes.

Modern Greek, Cyrillic as used in Russian requires, and Han characters as
they are used e.g. in Standard Mandarin usually require one _UTF-16 code
unit_, but characters from CJK Extensions-B and -C, and Compatibility
Ideographs Supplement require two of them.

Egyptian hieroglyphs require two _UTF-16 code units_. This is however
unrelated to the fact that their code points require at least two 16-bit
words to be represented in binary. It is a misconception to think of UTF-8,
UTF-16 or UTF-32 as encodings that combine char(acter) codes to represent
another character.

Learn the difference between characters and code units.

<http://unicode.org/faq/>


PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
From: Andrea Giammarchi on
On May 23, 8:34 pm, David Mark <dmark.cins...(a)gmail.com> wrote:
>
> Packer is a complete waste of time.
>

you never waste an opportunity to be arrogant, don't ya?

Dean' packer has been revolutionary by its time and it is still widely
adopted, improved, maintained, regardless what *you* think.

A bit more respect for those devs that have been always there teaching
and explaining us with valid software and/or experiments would be
probably more appropriate for this group, isn't it?

Br,
Andrea Giammarchi
From: Andrea Giammarchi on
.... and btw, for the record, this press.js is a nice experiment as
well. The "decompressor" uses lot of unnecessary spaces and notation
but even if improved other guys already explained the side effect.

The fact hosts do not allow gzip means nothing to me, you can gzip and
deflate on build time then serve already gzipped/deflated files using
proper headers so the host won't be anything different from serving
just a file, and it won't be overloaded because of runtime
compression.

If you want an example, here one of my projects that does exactly what
I have described:
http://code.google.com/p/php-client-booster/

Best Regards,
Andrea Giammarchi