From: Einstein on
Thanks to dont dont attttt gmail (Changed so he don't get spammed)
giving me a sort of different mind set to look at things (His black
box example and follow up questions did it for me) I have now created
a recursive compression method, lossless, dwindling returns, which can
be applied to near statistically even random data, and a fair portion
of statistically uneven data which is still for binary compression,
considered random.

Full White Paper to come on SUNDAY. I have previously given notice to
the "most work hours" employer, and have 2 other flexible employers.
This Friday is my last night (Graveyards no less), Saturday I will
spend typing up the white paper, and Sunday I will submit it to
locations across the Web and World.

In short this represents the highest level compression obtainable by
mankind, without 'violating' Entropy as some recursive methods claim
to do.


Previously I posted that I had obtained this as output on a
statistical basis

00
01
0
0
1
1
1
1

It is entirely tracked. It came to me then that since everything is
tracked, I could do a simple substitution... 0 for 1

00
01
1
1
0
0
0
0

Statistically then I have a 70% ratio of 0's versus 30% 1's. In a
large binary string it will be statistically correct, and therefore a
simple Huffman at this stage can work.


I plan a full proofs method, and will be seeking an adequate
programmer up to the task of creating software for the code as well.
Dont Dont gets first dibs if he feels he is up to the task, this for
asking the right questions in the right manner.

I will then run software against all known benchmarks. I am quite
surprised at the fewer lines of code needed once I got in the proper
frame of reference that this will need, and am optimistic that this
may show a very fast compression rate, though sadly a slow
decompression rate.
From: Einstein on
Michael Harrington's Recursive Binary Compression Method

This is a Whitepaper, recognized in international law, as a source of
original idea for science releases. I have 1 year from the release
date of this Whitepaper to Patent it according to International Law.
This is also US law.

Recursive Compression of a Random Binary Sequence.

I, Michael Harrington, have found that you can in fact compress random
binary data. The utility uses a series of quick and easy to understand
series of methods to achieve statistical compression.

First lets describe the "Seperation Effect". Choose a string length.
This string can be anything 2 bits or longer, though for demonstration
purposes I will choose 3 bits. A given file typically has thousands,
millions, even billions of these 3 bit strings. However for our
purposes we will demonstrate with a 24 bit spreadsheet, outlining
every possible outcome of 3 bits. We will henceforth refer to these as
"Substrings"

Now within these strings, the rule is "When you find a 1, the data
after it (So long as it is inside the same Substring) is placed in a
Seperate 'file' (Though it can be seperated inside the same binary
line, the visual aspect of seeing it in a new file is best for a walk
through). This data is placed in proper order of arrival. This can
lead to [{Substring Size}-1] number of files.

In practice the LAST file is unable to be compressed unless there is a
pre-existing flaw in the code making for a significant skew.
Therefore, for our demonstration, we will not utilize it, though if
there is a significant imbalance in the ratio of 1's to 0's it can be
compressed using a standard Huffman, or any other binary compression
system that is applicable.

However our original file, holding information not sent to other
files, and our other files, can be compressed.

Let me demonstrate first the 3 bit, all outcomes look

000
001
010
100
011
101
110
111

This would become 3 files, with ( ) showing seperation into different
files

000
001
01 (0)
1 (00)
01 (1)
1 (01)
1 (1) (0)
1 (1) (1)

Therefore file one has:
000
001
01
1
01
1
1
1

File Two has:
0
00
1
01
1
1

File Three, the last and therefore the usually uncompressible, has:
0
1

Now it does not matter what actual 'sequence' these bits have. For
instance file 2 can look like any of the following

0 - 00 - 0 - 1 - 01 - 1 - 1 or
0 - 00 - 0 - 1 - 1 - 1 - 01 or
0 - 00 - 0 - 1 - 1 - 01 - 1 or
0 - 00 - 0 - 01 - 1 - 1 - 1 or
0 - 00 - 0 - 1 - 1 - 01 - 1 or
0 - 01 - 0 - 1 - 1 - 1 - 00 or

Or any other plausible outcome using the Six different outcomes shown
statistically.

This concludes the "Seperation Effect".

Now we move to the "Alteration Stage".

In the Alteration Stage we choose a high 0 or 1 type outcome, and
switch it for a low 0 or 1 outcome. This could be 01 for 11, 1 for 0,
000 for 011, or any other length required. A command file records what
switch we made via a standardizzed dictionary result for each possible
binary replacement.

For instance of a Dictionary we could have the following 4 bit
'substitution tables':
0000 = NO CHANGE
0001 = 1 and 0 swap
0010 = 00 and 11 swap
0100 = 01 and 11 swap
1000 = 10 and 11 swap
0011 = 00 and 01 swap
0101 = 10 and 01 swap
1001 = 00 and 10 swap
0110 = 111 and 000 swap
1010 = 110 and 000 swap
1100 = 101 and 000 swap
0111 = 011 and 000 swap
1011 = 001 and 000 swap
1101 = 010 and 000 swap
1110 = 100 and 000 swap
1111 = 111 and 001 swap

A library can be hard-coded, or softcoded into the system. Softcoding
requires the dictionary to be the total sum of the bits of the
dictionary, while hardcoding requires the dictionary only have the
bits required to point it in the correct direction. For demonstration
sake we will assume a 16 bit dictionary but not develop it, assuming
that it covers all possible outcomes of our demonstration 3 bit
substrings.

For our demonstration we need to swap, logically, 0 and 1. This
provides us with the greatest "Imbalance" then, in total inside the
file, of 1's verus 0's. We will then have a 7 to 1 ratio imbalance. IE
12.5% of the outcomes are a 1, and 87.5% of the outcomes are a 0. This
leads to an immeadiate compression rate with a standard huffman. After
application of 00 = 0, 01 = 10, 10 = 110, 11 = 111 (A Huffman
replacement table), we obtain a statistical size for file two of
67.96875% compared to before.

File One after swapping 0 for 1 has eleven 0's and three 1's. This is
a ratio of 0.7857 to 0.2143 (rounded). Using a standard Huffman we
then obtain a compression rate of 79.84694 (Rounded) on it.

We used 16 bits for each file as a "Dictionary". Now normally we would
not see compression here. But if we increased the file to 24,000 bits,
assuming an equal ratio of 3 bit substring outcomes, we have 32 bits +
[((14/24*79.84694%) + (8/24*67.96875%)+ (2/24))*24,000] bits, or a
statistical total of 18,648.07 (rounded) bits.

The only issue is now "Combining the Files". This is called
"Accounting". The command section, plus 3 seperate "files" needs to be
tracked in size. Since the entire of each data section, when combined,
should be no more than 1.5 times at most, we can assume individual
tracking on them all to be Log(original file size *1.5)/Log(2) bits
(rounded up). For our case this is 16 bits each. These are added to
the command section.

So now we have three sixteen bit sections tracking our seperate
"files" for their factual sizes after compression on them. We also
have two sixteeen bit sections for the dictionaries of file one and
file two. We need some bits to identify how large our substrings are,
lets use 4 bits for now. In fact we can go hog wild and even just
assume every one of our six parts in our command section are 40 bits
long. But we only do this for quickly showing what can occur. This
means we have 240 bits in a fixed length command string. Add this to
18,648.07 and we obtain 18,888.07 (lets call it 18,889 bits now, for
rounding purposes).

18,889 / 24,000 = 78.7041667% compression. This is on a statistical
evenly balanced ratio of substring allotments, randomly sorted. It
should easily stand the test on most random binary sequences as well.



Thesis on why Entropy is still in effect, but seems also broken: By
addressing a multi line format, where each line has an individual
plausible length, we can achieve compression results which will follow
a bell curve, until such a point where the file in theory cannot be
compressed anymore. Each individual file will result in a different
size, but the best way to describe what is happening is that each file
we try to compress can have results of anything between 1 bit, and the
maximum bits. Therefore if working on a 4 bit substring that was
somehow was compressible with this method, we would see a total of
45(?) possible outcomes, of which just under two thirds would lead to
a size decrease or remain the maximum possible size. And yet this does
not suffice. Therefore the only other plausible explanation is the
Hutter theory, where data compression is related to Artificial
Intelligence when used in my manner.

It should be impossible for two files to become the same result with
my method. Yet the pigeon hole method seems to contradict this. Only a
true software, running with the method will prove out if the theory
actually works.

I reserve all rights to this copyrighted paper. All works inside here
are my sole property, with no one else holding any rights. Only
permission to display the contents of this paper in full, are given,
so that the knowledge herein may be shown to others who may be
interested in asking for any rights to any of these works. Only with
specific written permission is any rights inferred, given, or
authorized.


Michael Harrington's contact information:

Michael Harrington
8116 SE Mill St #22
Portland Oregon, 97212
(971)-285-6279
From: NickNitro on
Michael,

>>Now it does not matter what actual 'sequence' these bits have. For
instance file 2 can look like any of the following.

You lose me here a bit, but if I understand correctly you are saying
that the possible contents of file 2:

0
00
1
01
1
1

Can occur in any order? But if you have something as you suggested:

0 - 00 - 0 - 1 - 01 - 1 - 1 or
0 - 00 - 0 - 1 - 1 - 1 - 01 or
0 - 00 - 0 - 1 - 1 - 01 - 1 or
0 - 00 - 0 - 01 - 1 - 1 - 1 or
0 - 00 - 0 - 1 - 1 - 01 - 1 or
0 - 01 - 0 - 1 - 1 - 1 - 00 or

Without the seperators, which I'm sure you're already well aware, you
can't store as that requires a 3rd symbol, but binary only provides 2:

000010111 or
000011101 or
000011011 or
000001111 or
000011011 or
001011100 or

This brings in uncertainty straight away as there is no way of knowing
if a 0 is actually a 0 or 00. So take the first one for example:

000010111

Could actually represent (using the possible symbols you list for file
2):

00 00 1 01 1 1 or
00 00 1 0 1 1 1 or
00 0 01 01 1 1 or
00 0 01 0 1 1 1

And quite a number more which I haven't listed. How do you handle this?
From: rossum on
On Fri, 30 Nov 2007 08:28:34 -0800 (PST), Einstein
<michaelhh(a)gmail.com> wrote:

>Michael Harrington's Recursive Binary Compression Method
>
Two questions:

1 Is your compression lossy? That is can you unambiguously restore
the original file - byte for byte - from the compressed version.

2 Can you compress any file at all or are there some files that are
incompressible using your method.

rossum

From: Einstein on
rossum <rossum48(a)coldmail.com> wrote:
> On Fri, 30 Nov 2007 08:28:34 -0800 (PST), Einstein
> <michaelhh(a)gmail.com> wrote:
>
> >Michael Harrington's Recursive Binary Compression Method
> >
> Two questions:
>
> 1 Is your compression lossy? That is can you unambiguously restore
> the original file - byte for byte - from the compressed version.
>
> 2 Can you compress any file at all or are there some files that are
> incompressible using your method.
>
> rossum

Sorry bout this, exceeded my posting rate (think it was actually the
spammed news releases to 845 different media emails) from Google.

Any how real quickly on the answers:

1) It is lossless.

2) There will be a statistically low number of files that cannot be
compressed from my system, however it would appear, at this time, that
any
of those should be compressible via alternative existing binary
compression
routines.


3) It seperately tracks each bit. The 1 indicates a switch, so
therefore we
know we have in the "next file" a portion of code. We refer to the
command
line to decompress the file then we use the dictionary to undo swaps,
then
if the next value is a 1, we look forward to the next file, or until
we get
a 1, or the known length of the file is found.

Therefore it is entirely possible to have this in file 2 and up:

00000 000 0 000 0000 0 00000 0000 00000

And it all is understood by the system

File 1 will remain an anomoly, as will the last file.

File one will have a pretty predictable pattern statistically, if 4
bits
for instance:

0000 x1
0001 x1
001 x 2
01 x4
1 x8

Only different files will find different valuations based upon
substring
length. Sorry for exceeding my post rate, really irks me I did so

- Einstein