|
From: Einstein on 30 Nov 2007 10:11 Thanks to dont dont attttt gmail (Changed so he don't get spammed) giving me a sort of different mind set to look at things (His black box example and follow up questions did it for me) I have now created a recursive compression method, lossless, dwindling returns, which can be applied to near statistically even random data, and a fair portion of statistically uneven data which is still for binary compression, considered random. Full White Paper to come on SUNDAY. I have previously given notice to the "most work hours" employer, and have 2 other flexible employers. This Friday is my last night (Graveyards no less), Saturday I will spend typing up the white paper, and Sunday I will submit it to locations across the Web and World. In short this represents the highest level compression obtainable by mankind, without 'violating' Entropy as some recursive methods claim to do. Previously I posted that I had obtained this as output on a statistical basis 00 01 0 0 1 1 1 1 It is entirely tracked. It came to me then that since everything is tracked, I could do a simple substitution... 0 for 1 00 01 1 1 0 0 0 0 Statistically then I have a 70% ratio of 0's versus 30% 1's. In a large binary string it will be statistically correct, and therefore a simple Huffman at this stage can work. I plan a full proofs method, and will be seeking an adequate programmer up to the task of creating software for the code as well. Dont Dont gets first dibs if he feels he is up to the task, this for asking the right questions in the right manner. I will then run software against all known benchmarks. I am quite surprised at the fewer lines of code needed once I got in the proper frame of reference that this will need, and am optimistic that this may show a very fast compression rate, though sadly a slow decompression rate.
From: Einstein on 30 Nov 2007 11:28 Michael Harrington's Recursive Binary Compression Method This is a Whitepaper, recognized in international law, as a source of original idea for science releases. I have 1 year from the release date of this Whitepaper to Patent it according to International Law. This is also US law. Recursive Compression of a Random Binary Sequence. I, Michael Harrington, have found that you can in fact compress random binary data. The utility uses a series of quick and easy to understand series of methods to achieve statistical compression. First lets describe the "Seperation Effect". Choose a string length. This string can be anything 2 bits or longer, though for demonstration purposes I will choose 3 bits. A given file typically has thousands, millions, even billions of these 3 bit strings. However for our purposes we will demonstrate with a 24 bit spreadsheet, outlining every possible outcome of 3 bits. We will henceforth refer to these as "Substrings" Now within these strings, the rule is "When you find a 1, the data after it (So long as it is inside the same Substring) is placed in a Seperate 'file' (Though it can be seperated inside the same binary line, the visual aspect of seeing it in a new file is best for a walk through). This data is placed in proper order of arrival. This can lead to [{Substring Size}-1] number of files. In practice the LAST file is unable to be compressed unless there is a pre-existing flaw in the code making for a significant skew. Therefore, for our demonstration, we will not utilize it, though if there is a significant imbalance in the ratio of 1's to 0's it can be compressed using a standard Huffman, or any other binary compression system that is applicable. However our original file, holding information not sent to other files, and our other files, can be compressed. Let me demonstrate first the 3 bit, all outcomes look 000 001 010 100 011 101 110 111 This would become 3 files, with ( ) showing seperation into different files 000 001 01 (0) 1 (00) 01 (1) 1 (01) 1 (1) (0) 1 (1) (1) Therefore file one has: 000 001 01 1 01 1 1 1 File Two has: 0 00 1 01 1 1 File Three, the last and therefore the usually uncompressible, has: 0 1 Now it does not matter what actual 'sequence' these bits have. For instance file 2 can look like any of the following 0 - 00 - 0 - 1 - 01 - 1 - 1 or 0 - 00 - 0 - 1 - 1 - 1 - 01 or 0 - 00 - 0 - 1 - 1 - 01 - 1 or 0 - 00 - 0 - 01 - 1 - 1 - 1 or 0 - 00 - 0 - 1 - 1 - 01 - 1 or 0 - 01 - 0 - 1 - 1 - 1 - 00 or Or any other plausible outcome using the Six different outcomes shown statistically. This concludes the "Seperation Effect". Now we move to the "Alteration Stage". In the Alteration Stage we choose a high 0 or 1 type outcome, and switch it for a low 0 or 1 outcome. This could be 01 for 11, 1 for 0, 000 for 011, or any other length required. A command file records what switch we made via a standardizzed dictionary result for each possible binary replacement. For instance of a Dictionary we could have the following 4 bit 'substitution tables': 0000 = NO CHANGE 0001 = 1 and 0 swap 0010 = 00 and 11 swap 0100 = 01 and 11 swap 1000 = 10 and 11 swap 0011 = 00 and 01 swap 0101 = 10 and 01 swap 1001 = 00 and 10 swap 0110 = 111 and 000 swap 1010 = 110 and 000 swap 1100 = 101 and 000 swap 0111 = 011 and 000 swap 1011 = 001 and 000 swap 1101 = 010 and 000 swap 1110 = 100 and 000 swap 1111 = 111 and 001 swap A library can be hard-coded, or softcoded into the system. Softcoding requires the dictionary to be the total sum of the bits of the dictionary, while hardcoding requires the dictionary only have the bits required to point it in the correct direction. For demonstration sake we will assume a 16 bit dictionary but not develop it, assuming that it covers all possible outcomes of our demonstration 3 bit substrings. For our demonstration we need to swap, logically, 0 and 1. This provides us with the greatest "Imbalance" then, in total inside the file, of 1's verus 0's. We will then have a 7 to 1 ratio imbalance. IE 12.5% of the outcomes are a 1, and 87.5% of the outcomes are a 0. This leads to an immeadiate compression rate with a standard huffman. After application of 00 = 0, 01 = 10, 10 = 110, 11 = 111 (A Huffman replacement table), we obtain a statistical size for file two of 67.96875% compared to before. File One after swapping 0 for 1 has eleven 0's and three 1's. This is a ratio of 0.7857 to 0.2143 (rounded). Using a standard Huffman we then obtain a compression rate of 79.84694 (Rounded) on it. We used 16 bits for each file as a "Dictionary". Now normally we would not see compression here. But if we increased the file to 24,000 bits, assuming an equal ratio of 3 bit substring outcomes, we have 32 bits + [((14/24*79.84694%) + (8/24*67.96875%)+ (2/24))*24,000] bits, or a statistical total of 18,648.07 (rounded) bits. The only issue is now "Combining the Files". This is called "Accounting". The command section, plus 3 seperate "files" needs to be tracked in size. Since the entire of each data section, when combined, should be no more than 1.5 times at most, we can assume individual tracking on them all to be Log(original file size *1.5)/Log(2) bits (rounded up). For our case this is 16 bits each. These are added to the command section. So now we have three sixteen bit sections tracking our seperate "files" for their factual sizes after compression on them. We also have two sixteeen bit sections for the dictionaries of file one and file two. We need some bits to identify how large our substrings are, lets use 4 bits for now. In fact we can go hog wild and even just assume every one of our six parts in our command section are 40 bits long. But we only do this for quickly showing what can occur. This means we have 240 bits in a fixed length command string. Add this to 18,648.07 and we obtain 18,888.07 (lets call it 18,889 bits now, for rounding purposes). 18,889 / 24,000 = 78.7041667% compression. This is on a statistical evenly balanced ratio of substring allotments, randomly sorted. It should easily stand the test on most random binary sequences as well. Thesis on why Entropy is still in effect, but seems also broken: By addressing a multi line format, where each line has an individual plausible length, we can achieve compression results which will follow a bell curve, until such a point where the file in theory cannot be compressed anymore. Each individual file will result in a different size, but the best way to describe what is happening is that each file we try to compress can have results of anything between 1 bit, and the maximum bits. Therefore if working on a 4 bit substring that was somehow was compressible with this method, we would see a total of 45(?) possible outcomes, of which just under two thirds would lead to a size decrease or remain the maximum possible size. And yet this does not suffice. Therefore the only other plausible explanation is the Hutter theory, where data compression is related to Artificial Intelligence when used in my manner. It should be impossible for two files to become the same result with my method. Yet the pigeon hole method seems to contradict this. Only a true software, running with the method will prove out if the theory actually works. I reserve all rights to this copyrighted paper. All works inside here are my sole property, with no one else holding any rights. Only permission to display the contents of this paper in full, are given, so that the knowledge herein may be shown to others who may be interested in asking for any rights to any of these works. Only with specific written permission is any rights inferred, given, or authorized. Michael Harrington's contact information: Michael Harrington 8116 SE Mill St #22 Portland Oregon, 97212 (971)-285-6279
From: NickNitro on 30 Nov 2007 12:35 Michael, >>Now it does not matter what actual 'sequence' these bits have. For instance file 2 can look like any of the following. You lose me here a bit, but if I understand correctly you are saying that the possible contents of file 2: 0 00 1 01 1 1 Can occur in any order? But if you have something as you suggested: 0 - 00 - 0 - 1 - 01 - 1 - 1 or 0 - 00 - 0 - 1 - 1 - 1 - 01 or 0 - 00 - 0 - 1 - 1 - 01 - 1 or 0 - 00 - 0 - 01 - 1 - 1 - 1 or 0 - 00 - 0 - 1 - 1 - 01 - 1 or 0 - 01 - 0 - 1 - 1 - 1 - 00 or Without the seperators, which I'm sure you're already well aware, you can't store as that requires a 3rd symbol, but binary only provides 2: 000010111 or 000011101 or 000011011 or 000001111 or 000011011 or 001011100 or This brings in uncertainty straight away as there is no way of knowing if a 0 is actually a 0 or 00. So take the first one for example: 000010111 Could actually represent (using the possible symbols you list for file 2): 00 00 1 01 1 1 or 00 00 1 0 1 1 1 or 00 0 01 01 1 1 or 00 0 01 0 1 1 1 And quite a number more which I haven't listed. How do you handle this?
From: rossum on 30 Nov 2007 13:02 On Fri, 30 Nov 2007 08:28:34 -0800 (PST), Einstein <michaelhh(a)gmail.com> wrote: >Michael Harrington's Recursive Binary Compression Method > Two questions: 1 Is your compression lossy? That is can you unambiguously restore the original file - byte for byte - from the compressed version. 2 Can you compress any file at all or are there some files that are incompressible using your method. rossum
From: Einstein on 30 Nov 2007 13:14
rossum <rossum48(a)coldmail.com> wrote: > On Fri, 30 Nov 2007 08:28:34 -0800 (PST), Einstein > <michaelhh(a)gmail.com> wrote: > > >Michael Harrington's Recursive Binary Compression Method > > > Two questions: > > 1 Is your compression lossy? That is can you unambiguously restore > the original file - byte for byte - from the compressed version. > > 2 Can you compress any file at all or are there some files that are > incompressible using your method. > > rossum Sorry bout this, exceeded my posting rate (think it was actually the spammed news releases to 845 different media emails) from Google. Any how real quickly on the answers: 1) It is lossless. 2) There will be a statistically low number of files that cannot be compressed from my system, however it would appear, at this time, that any of those should be compressible via alternative existing binary compression routines. 3) It seperately tracks each bit. The 1 indicates a switch, so therefore we know we have in the "next file" a portion of code. We refer to the command line to decompress the file then we use the dictionary to undo swaps, then if the next value is a 1, we look forward to the next file, or until we get a 1, or the known length of the file is found. Therefore it is entirely possible to have this in file 2 and up: 00000 000 0 000 0000 0 00000 0000 00000 And it all is understood by the system File 1 will remain an anomoly, as will the last file. File one will have a pretty predictable pattern statistically, if 4 bits for instance: 0000 x1 0001 x1 001 x 2 01 x4 1 x8 Only different files will find different valuations based upon substring length. Sorry for exceeding my post rate, really irks me I did so - Einstein |