From: Paul E. Schoen on

"Dr J R Stockton" <reply1020(a)merlyn.demon.co.uk> wrote in message
news:OwFC6XIYZY8LFwUJ(a)invalid.uk.co.demon.merlyn.invalid...
> In comp.lang.javascript message <M6GHn.11586$Gx2.2053(a)newsfe20.iad>,
> Sat, 15 May 2010 19:28:30, Paul E. Schoen <paul(a)pstech-inc.com> posted:
>
>>
>>I had always thought Base64 was used just to be able to send binary
>>data in text form, without control characters.
>
> Yes.
>
>> Compression was a separate process. There seem to be 94 ASCII
>>characters (excluding Space) that could be used for encoding. If all 8
>>bits of an unsigned character could be used, a base 128 would be
>>possible, and it should be twice as efficient as base64.
>
> No; a 7-bit character only contains about 15% more information than a
> 6-bit one.

I thought about that after I wrote and sent the post. The character contains
twice as many bits (128/64), but only an increase of 7/6 or 1.1667.

Paul

From: Spamless on
On 2010-05-17, Paul E. Schoen <paul(a)pstech-inc.com> wrote:

> I thought about that after I wrote and sent the post. The character contains
> twice as many bits (128/64), but only an increase of 7/6 or 1.1667.

Can represent twice as many values but only has one more bit.
From: Spamless on
On 2010-05-15, Paul E. Schoen <paul(a)pstech-inc.com> wrote:
>
> "nick" <nick___(a)fastmail.fm> wrote in message
> news:75b38d32-b2f9-4445-97cb-3e63e2da3f1f(a)r34g2000yqj.googlegroups.com...
>>I wanted to try making a javascript compressor (more specifically,
>> implement a compression engine in javascript, not necessarily for
>> compressing js) ... but then I found this:
>>
>> http://marklomas.net/ch-egg/articles/lzwjs.htm
>>
>> So what do you guys think? Is this any better than base 64? It sure as
>> hell is ugly :)
>
> I had always thought Base64 was used just to be able to send binary data in
> text form, without control characters. Compression was a separate process.
> There seem to be 94 ASCII characters (excluding Space) that could be used
> for encoding.

It is easier (no math) to encode if the base is a power of two since we
are working with binary. Take the least common multiple of 8 (8 bits, a
byte) and the number of bits in the base (6 for base 64) and take
6 bytes to encode in 8 base64 digits just by taking the right bits:

BITS FOR EIGHT BASE 64 DIGITS:
123456123456123456123456123456123456123456123456

48 BITS
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

BITS FOR SIX BASE 256 CHARACTERS:
123456781234567812345678123456781234567812345678

Which characters? 94? Be careful of characters which may not pass through
mail systems and ones which can be munged. UUENCODING was a base64 method
for unix-unix transfers of data over plain text lines but ... using some
IBM systems as relays which used EBCDIC character encoding resulted in
corruption. XXENCODING (I believe) was another base64 encoding changing
the characters used as the base64 "digits" set. They had line starters,
terminators, checksums(?).

MIME base64 uses 64 digits guaranteed(?) to work (well, maybe not on
APPLE I's which only had upper case characters?).

YYENCODING is now often used in newsgroups. It is pretty much raw 8 bit
since "modern" news servers can handle it. There is some encoding, but
it expands the size very little. Of course, if you save such a file and
try to email it ... it may well fail (email systems may be much more
sensitive) and trying to use one of those back on the old GEnie system
(7 bit, even parity) would fail. FTPing such a file using Windows (is
the default still TEXT instead of IMAGE/BINARY) may strip off the high
bits. Base64 encodings expand the file size by 33% (assuming you have to
use 8 bits to send one of the 64 base64 "digits") (compress the file first!).
You don't want the overhead of changing bases if it involves mathematical
operations (divide, etc.) or manipulation to avoid certain sensitive characters.
You just want a simple (fast, easy) selection (of bits).
From: Spamless on
On 2010-05-18, Spamless <Spamless(a)Nil.nil> wrote:

> Which characters? 94?

While using a base which is a power of two makes life very easy,
consider using a base where b^n is a little larger than a power
of two (rather than equal to a power of two). For example,
10^3 is about 2^10 so one can almost encode all 10 bit sequences
as three base 10 digits, but not quite as 10^3 is a bit smaller
than 2^10.

90^2=8100, 2^13=8192

That's a bit small. Better would be base 91.

91^2=8281, 2^13=8192

You can encode any sequence of thirteen bits as a pair of
base 91 digits. If you choose 91 "safe" characters to use
to represent those digits you can encode any sequence of
13 bits as a pair of those digits. If it takes 8 bits to
send each such character/digit, you can send any sequence
of 13 bits as those two characters (16 bits) (though you
will not use up all the possible pairs which can handle
numbers up through 8281 - you can use those other pairs
for control). This would be an increase of
(16-13)/13=23% (instead of the 33% of base 64 encoding).

But how do you convert a number (from 0 to 2^13-1) to
base 91 (it will take at most two "digits")? First extract
out the 13 bits (take 13 original bytes and split them into
8 sequences of 13 bits each). Convert each sequence of 13
bits into to base 91 digits and send those (16 bits to send
them). Convert to base 91? Divide by 91 (integer division)
to find the high digit and take it mod 91 to get the low
base 91 digit.

Which 91 characters are guaranteed to be safe on various
systems (including legacy systems using EBCDIC encoding
of which there are variants)?

Of course for any integer which is not a power of two,
ln(b)/ln(2) is irrational (even transcendental!) and so
there are some integers k,m so that k*ln(b) is as close
as you want but larger than m*ln(2) and you can use k
digits in base b to represent m digits in base 2.

How about base 83? A continued fraction expansion of
ln(83)/ln(2) shows that a good choice would be 8
base 83 digits to encode 51 bits:

83^8 = 2252292232139041
2^51 = 2251799813685248

and so one can encode 51 bits as 8 base 83 digits and
assuming that sending each digit takes 8 bit, you have
to send 64 bits (instead of 51) for an increase of
25.5% (if there aren't 91 "safe" characters to use as
digits but there are 83, one encode with an overhead
of 25.5% instead of base 64's 33.3%). Of course, in
this case one has to convert a number from
0 through 2251799813685248-1
to base 83.