From: nick on
I'd like to hear this group's reaction on a javascript compression
script I've been working on. It uses the LZW algorithm and base85
encoding to squeeze large scripts down to size.

Quick test...

used this: http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
original size: 72173
compressed: 44782

You can test it here:
http://pressjs.googlecode.com/svn/trunk/build/test.html

Browse the source:
http://code.google.com/p/pressjs/source/browse/#svn/trunk/src

I'd love to hear what you guys think, esp. any way we could optimize
it for speed or size, or if you catch any bugs / memory leaks /
namespace pollution / stupid programming fails / etc. Thanks!
From: Sean Kinsey on
On May 22, 11:10 pm, nick <nick...(a)fastmail.fm> wrote:
> I'd like to hear this group's reaction on a javascript compression
> script I've been working on. It uses the LZW algorithm and base85
> encoding to squeeze large scripts down to size.
>
> Quick test...
>
> used this:http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
> original size: 72173
> compressed: 44782
>
> You can test it here:http://pressjs.googlecode.com/svn/trunk/build/test.html
>
> Browse the source:http://code.google.com/p/pressjs/source/browse/#svn/trunk/src
>
> I'd love to hear what you guys think, esp. any way we could optimize
> it for speed or size, or if you catch any bugs / memory leaks /
> namespace pollution / stupid programming fails / etc. Thanks!


I'm sorry to say that your attempt to 'compress' code has failed. Did
you ever take into consideration that gzip (used to served compressed
files) also use LZW (and in a more efficient way than you are)?

A quick test I did with an input file of 56.3KB:
Direct compression using 7-Zip into a .gz archive = 12KB
Compression using pressjs and then compressed into a .gz archive:
20.9KB

And the same using a minified version of the same script
Direct compression using 7-Zip into a .gz archive = 4.51KB
Compression using pressjs and then compressed into a .gz archive:
7.68KB

Not to mention the added overhead of having to decompress the file
after the UA has downloaded the file.

The only scenario where this method would be beneficial is where gzip
is not used on the server, bad caching directives are used causing the
file to be downloaded in full each time, and the extra time used
downloading is higher than the extra time needed to decompress.
Hopefully that isn't a too-common scenario.

But hey, it was probably fun to create :)
From: Johannes Baagoe on
nick :

> http://pressjs.googlecode.com/svn/trunk/build/test.html

"Où qu'il réside, même aux îles Caïmans, tout Français inscrit au rôle
paiera son dû dès Noël" (length 92) "compresses" to 118 characters.

> http://code.google.com/p/pressjs/source/browse/#svn/trunk/src

From http://code.google.com/p/pressjs/source/browse/trunk/src/compdict.js :

// Populate table with all possible character codes.
for(var i = 0; i < 256; ++i) {
var str = String.fromCharCode(i);
this.hashtable[str] = this.nextcode++;
}

What about character codes >= 256?

My general impression is that you are complicating things for no reason.
Why use constructors, prototypes and fancy "//#" pseudo-cpp directives?
Just one file which defines the two functions that compress and expand
would be much easier both to write and to review.

(I assume that you are doing this for fun, for the challenge of writing
a compressor in javascript. If it is in order to reduce bandwidth
in real applications on the Web, enabling gzip on the server is much
more efficient.)

--
Johannes
From: nick on
On May 23, 7:43 am, Sean Kinsey <okin...(a)gmail.com> wrote:

> I'm sorry to say that your attempt to 'compress' code has failed. Did
> you ever take into consideration that gzip (used to served compressed
> files) also use LZW (and in a more efficient way than you are)?

Yeah, I thought about that but I figured the point of javascript
compressors was that they would be used in environments where gzip
compression on the server is not an option (many shared hosts, which
many people seem content to use, for some reason don't use gzip).

> A quick test I did with an input file of 56.3KB:
> Direct compression using 7-Zip into a .gz archive = 12KB
> Compression using pressjs and then compressed into a .gz archive:
> 20.9KB

> And the same using a minified version of the same script
> Direct compression using 7-Zip into a .gz archive = 4.51KB
> Compression using pressjs and then compressed into a .gz archive:
> 7.68KB

I wonder if encoding to base64 would yield better compression ratios
afterwards? Maybe still not as good as using gzip on the uncompressed
file though.

I just did a similar test with Dean Edwards' "packer" with the "Base62
encode" and "Shrink variables" options on and it manages to get a
similar gzip-compressed size to the gzip-compressed size of the
original... If I can achieve a similar gzip-compressed size after
pressing, I think this should be at least as useful as packer (not
sure what this group's opinion of packer is, though).

> Not to mention the added overhead of having to decompress the file
> after the UA has downloaded the file.

True, although the size overhead is only about 1200 bytes (and
shrinking), and the processing overhead is negligible.

> The only scenario where this method would be beneficial is where gzip
> is not used on the server, bad caching directives are used causing the
> file to be downloaded in full each time, and the extra time used
> downloading is higher than the extra time needed to decompress.
> Hopefully that isn't a too-common scenario.

It's more common than you might think (shared hosting).

> But hey, it was probably fun to create :)

It was :) Thanks for the comments.
From: nick on
On May 23, 9:53 am, Johannes Baagoe <baa...(a)baagoe.com> wrote:
> nick :
>
> >http://pressjs.googlecode.com/svn/trunk/build/test.html
>
> "Où qu'il réside, même aux îles Caïmans, tout Français inscrit au rôle
> paiera son dû dès Noël" (length 92) "compresses" to 118 characters.

Well, you obviously used the wrong text.

"banana cabana banana cabana banana cabana banana cabana banana
cabana" (length 69) compresses to 44 characters! ;)

> >http://code.google.com/p/pressjs/source/browse/#svn/trunk/src
>
> Fromhttp://code.google.com/p/pressjs/source/browse/trunk/src/compdict.js:
>
>   // Populate table with all possible character codes.
>   for(var i = 0; i < 256; ++i) {
>     var str = String.fromCharCode(i);
>     this.hashtable[str] = this.nextcode++;
>   }  
>
> What about character codes >= 256?

I'm pretty sure those characters aren't allowed in a javascript
document? I'm not really sure what's going on there though, I was
puzzled by that bit as well. See my next paragraph.

> My general impression is that you are complicating things for no reason.
> Why use constructors, prototypes and fancy "//#" pseudo-cpp directives?
> Just one file which defines the two functions that compress and expand
> would be much easier both to write and to review.

Yeah, that stuff is all part of another GPL program I ripped off to
make this compressor, which in turn is a pretty much direct port of
some c++ code, so it has a very class-like design. I've been going
through and making it more object-based, and trying to learn the
algorithm at the same time. Eventually I'd like to replace all of that
code, but for now I just wanted to see if this whole idea was viable.

Well, the cpp directives were my idea. I like to be able to separate
the files into logical units, and ifdef comes in handy when building
multiple similar-but-different targets (like stand-alone vs embedded
decompressor).

I'm definitely considering merging instream and outstream
functionalities into the compressor / decompressor, but I think I like
the dictionaries in separate files for now.

> (I assume that you are doing this for fun, for the challenge of writing
> a compressor in javascript. If it is in order to reduce bandwidth
> in real applications on the Web, enabling gzip on the server is much
> more efficient.)

Yeah, I'm mostly doing it to see if it can be done. Next I want to
experiment with a different compression algorithm or one of the
variations on LZW. Server-side gzip is obviously the better
alternative if it's available; however that's not always the case (see
my response to Sean) and so we have things like "packer" and maybe
this thing.