From: Francois Grieu on
On 05/06/2010 03:05, Keith F. Lynch wrote:
> Francois Grieu<fgrieu(a)gmail.com> wrote:
>> I'll assume that in the enumeration above, "Hash" is used where the
>> standard terminology is Exclusive OR.
>
> Yes, sorry.
>
>> I remark that if the plaintext is all-zeroes
>
> I never encrypt any files of all zeroes. Why would you think I would?

It is not a far-fetched scenario that you receive/have a few files, some interesting enough that you keep all of them without examining each in detail, and they become part of what you encrypt. One of these files could be all-zeroes, perhaps by accident, or for the purpose attacking your cryptosystem (if you communicate with your adversaries, or if your adversaries manage to modify a message sent by a friend). Just a possibility, but one that must be considered in a general-purpose encryption tool.

> I'll agree that you could get S, up to the length of the file, if I did.

The restriction "up to the length of the file" does not apply if the file is a bit longer than the key (say 16 bytes); in that case, after recovering S up to the length of the file, the key itself can easily be recovered using a classic technique that Paul Rubin was the first to propose in this thread [solving a system of 100 linear equations of 100 unknown, e.g. by plain Gaussian elimination].

>> This hypothesis can be easily tested and the key recovered using the
>> linearity-based attack proposed by Paul Rubin.
>
> Are you saying this attack would work even if the file isn't all
> zeroes?

No, I'm not saying that. I'm saying that the key (for that all-zeroes file) can be recovered from the ciphertext; Then the security of other files will depend on how the keys for theses files are generated. More on that later.

>> From an academic standpoint, this is an attack (for an ideal cipher,
>> the ciphertext should never leak any information on the plaintext,
>> beside its length perhaps, and even known plaintext should not
>> reveal the key).
>
> I don't see how known plaintexts would reveal the key, except for
> unusual special cases such as all-zeroes.

I do not "see" either, as in having a plan for an attack that may work with known plaintext from a single file.

> Or any plaintext that was
> written an alphabet of just one or two characters, since "stirring"
> doesn't alter such texts.

In the description of the complete cryptosystem that you gave, you are never stirring the plaintext, so using that particular plaintext does not simply lead to a break. The three stirrings are performed on random-like data, even to an adversary knowing plaintext and ciphertext. The way you combined S and stirring does creates a serious obstacle to the attacker.
My attack with the all-zeroes file (a so-called chosen plaintext attack) happens to work around that obstacle, but (at least for now) I fail to find anything working with any other chosen plaintext, much less known plaintext.

>> It is not clear if the same key (thus S) is reused for different
>> plaintexts; ...
>
> Yes, otherwise I wouldn't have bothered with stirring.

Since the key is the same for all files, you indeed are vulnerable to one of the files encrypted consisting only of zeroes; and again it needs NOT be big (16 bytes is ample to recover the key).

Other attacks requiring only known plaintext might be conceivable; in particular with knowlege of several files, especially if these files differ only slightly (a small change in the input of stirring changes a relatively small block of the output, and the analysis of that change is telling something). I'm not saying that would be possible, much less trivial.

> It did occur
> to me to just use the stream cipher. I would have a different key for
> every file, and all the keys would be stored in an index which would
> be encrypted with a master key. I seem to recall that the main reason
> I didn't take that approach is that I couldn't find a way with the
> hardware I had to generate enough random keys. With the approach I
> took, since I only needed one (100-bit) key, it was practical to
> generate it by coin flipping.

That other cryptosystem is extremely weak: Paul Rubin's attack recovers the key for one file if a few of the plaintext for that file file can be guessed. And that's the case of gzipped files [the default gzip header with file name and date is likely enough]. Then, that key is just the right amount of known plaintext to recover the key for the master file. IMHO your system with three stirrings is much harder to break, if breakable.

> It's possible to flip coins fairly rapidly by placing a large number
> of identical fair coins in a can, shaking it vigorously for several
> seconds, then removing coins from it one at a time without looking at
> them until you've set them all down in a row. But 100 coins times
> the number of files I want to encrypt, that's not practical.

I bought a 16-faced dice with the intention of using it for Realy Important keys entered in hexadecimal. That did not survive practice.

>> If the key is reused, it is enough for an adversary to manage to
>> inject one short all-zeroes file in the plaintext files to recover
>> the key and decipher all the other files. That can occur in real
>> life (you receive a number of interesting test cases and incorporate
>> them in your plaintext files, inclusing the rogue one).
>
> Not the way I do things. The only files I ever bother to encrypt, or
> even save, are ones I look at. If someone emails me a bunch of NULs,
> or any other kind of garbage, I won't be saving that email, much less
> encrypting it.

Nowadays, the overwhelming majority of computer users have files on their systems that they did not look at. The last time I was NOT in this situation was in the early eighties. And many users would expand an archive they download and keep its whole content if a sample of it is turns out to be useful.
My scenario only ask that you encrypt such a file that you somewhat received/have, but did not look at. I can't tell if you would include such files in what you encrypt; but I, and many others, do it routinely.

>>> One feature it has is that it's the same forwards and backwards:
>>> I don't have to specify whether I'm encrypting or decrypting.
>
>> In some contexts that property is a weakness. In particular, if the
>> adversary can inject a ciphertext as plaintext, the cipher is broken.
>
> Once again, that never happens. If I can't make sense of an email,
> newsgroup posting, or other text, I don't save it. (I have a text-
> only Internet account.) Anyhow, even if they did, it would be broken
> *only for that one file*.
>
> The bad guy finds a backup thumb drive of mine buried in the woods.
> He's intrigued by the encrypted file named bad_guy_don't_read_this.
> So he makes a copy of the encrypted version, and emails it to me?
> Even if I saved that gibberish email instead of discarding it, and
> even if I then encrypted it, the email headers on it would prevent
> the encryption from reversing the previous encryption.

Granted, the attack where the encryption system is used to decipher does not match your setup well. I mentioned it because it is standard and sometime critical (e.g. for a cipher used in an authentication protocol).

>>> The plaintext is typically a compressed file of English ASCII text,
>>> compressed using either gzip or bzip2.
>
>> If that's taken into consideration, we are getting far from a
>> simple, analysable cipher. Yes, systematically compressing the
>> plaintext before encryption tends to increase security.
>
> I figured it would. (Even though, as Phil pointed out, the first few
> characters of the compressed file are known.) But that's not why I
> do it. I do it to save space. And to save the time it takes to do a
> backup. (I've been told it shouldn't take two hours to overwrite a
> four gig USB thumb drive. But for me it does.)

<OT> The order of magnitude sounds like a USB link stuck at "full speed" (12Mbit/s) rather than "high speed" (480 Mbit/s). </OT>

>>> Why did I design and write my own system? Mostly just for fun.
>>> But also because that way I know for an absolute fact that neither
>>> the algorithm nor the program that implements it has any secret
>>> back doors.
>
>> That "no backdoor" argument also applies if you implement a simple
>> standard cipher yourself (e.g. AES-CTR) which is probably easier to
>> implement than yours.
>
> That would guarantee there's no backdoor in the program, but not that
> there's no backdoor in the cipher itself. How do I know its author
> didn't design it with a backdoor?

My use of "simple" was meant to try to counter that otherwise valid objection.

> For that matter, how do I know
> someone didn't spend an enormous amount of time, years ago, figuring
> out a way to crack it? The more people use it, the more motivation
> there is to break it. The longer it's been in use, the more time
> people have had to try and crack it.

Yes; but time also works in the other direction, making it more likely that the crack becomes known.

>>> Nor has anyone ever worked on breaking this cipher.
>
>> This does nothing to demonstrate that the cipher is secure; on
>> the contrary.
>
> Is it really possible to demonstrate that a cipher (other than a
> one-time pad) is secure? Or only that it's insecure? Or that nobody
> has yet broken it, at least nobody who wanted to reveal that they had
> done so?

I wish that I used "indicate" rather than "demonstrate". Other than that, I maintain that surviving intense scrutiny is a fair indication of security, and the best available for practical systems (small key). By contrast, "homegrown" has times and times (though far from always) been associated with weak. Notice that one scheme that you considered would have been totally unsafe.

>>> Nor is anyone ever likely to
>
>> Now that the cipher is on sci.crypt, that no longer stands :-)
>
> Perhaps. Is anyone here interested in trying it? If so, I'd be
> glad to provide some sample ciphertext. Just tell me how long the
> ciphertexts should be, how many of them there should be, and how many
> bits (if not a full 100) the key should have. And whether you'd
> prefer ASCII text, gzipped text, or bzip2ed text as the plaintext.
>
> No, I'm not going to provide a file of all NULs as a plaintext. :-)

I am tempted, but time is scarce.

>>> "Stirring" has some interesting mathematical properties.
>
>> Yes, but it is slow (256 operations per byte of plaintext for each
>> of the three stirring).
>
> It's enormously faster than actually making the backup, which just
> just a straightforward copy. (Well, okay, I'm cheating. I do a full
> backup of all files, old and new, every time, but don't re-encrypt all
> the encrypted files every time. But it's fast enough that there would
> be little benefit to making it faster.)

Regards,

Francois Grieu
From: Francois Grieu on
In a post of 03/06/2010 05:09 down this thread, Keith F. Lynch
wrote something similar to (freely edited)
> "stirring" consists of
> repeatedly reversing substrings within the plaintext. For each of the
> 256 possible characters, one at a time, in order, I use that character
> as a delimiter, breaking up the text into some number of substrings.
> Each of those substrings I reverse in place. I do so circularly, so
> that the first and last characters also get moved. For instance:
>
> Criticism of a proposed stream cipher requested.
>
> would get turned into:
>
> equed a opriseream phestipof red.Cr strosm cicit
>
> My complete system consists of:
>
> * Generate S, the XORed together key-defined subset of the square
> roots of the first hundred primes, as many binary digits of each
> as there are bits in the plaintext file.
> * XOR the plaintext with S.
> * "Stir" the result.
> * XOR this new result with a "stirred" copy of S.
> * Reverse-stir the latest result (i.e. start at the end of the
> 256-character "alphabet" and work backwards).
> * XOR the latest result with S.
>
> One feature it has is that it's the same forwards and backwards:
> I don't have to specify whether I'm encrypting or decrypting.
>
> The plaintext is typically a compressed file of English ASCII text,
> compressed using either gzip or bzip2.

Keith F. Lynch later made clear that the same 100-bit key (thus the
keystream S) is reused for different plaintexts/files.

In all the following I ignore the compression of plaintext.

It is now established that recovering slightly more of 100 bits of S
would be enough to recover the key by Gaussian elimination (and confirm
that fact), then compute S in full; and that it would be easy if one of
the file encrypted *entirely* consists of like 10 (or more) bytes with
value 00h. This does *not* break the cryptosystem in the setup Keith F.
Lynch (claims he) uses.


I notice a property of "stirring": when it applies to random-like input
(which is the case in each of the three uses in this cryptosystem,
because S is random-like), an individual byte is not moved very far away
from its original position. I'm confident that odds of a byte being
moved farther than 50kByte are very low.

It follows that if two files are the same in a 100kByte segment, their
ciphertext is very likely identical in the center of that zone.
Therefore, with the ciphertexts corresponding to a file of all-a,
all-b,..., overs 1MByte, the question "does the plaintext for a given
other 1MB ciphertext contain a letter repeated at least 100k times, and
approximately where" can be answered with near certainty. Again, from an
academic perspective, this is a break of the cryptosystem.

The saying "Attacks only get better" (sometime attributed to the NSA)
comes to mind, and give credence to my suspicion that the cryptosystem
could be broken in a much more practical sense, perhaps including with
the compression step and a few known plaintext. But I have no proof of
that.

Francois Grieu
From: Mok-Kong Shen on
Maaartin wrote:

> For all good ciphers there are some proofs. Proving them secure is in
> fact impossible, but there are proofs showing them immune against
> known attacks. Even in case the NSA could crack AES, it wouldn't be
> cheap. I assume that the electricity bill for cracking my files would
> cost much more than they're worth, even assuming specialized hardware.

To this general issue, I like to repeat what I said previously
Many kinds of known attacks: known-plaintext, chosen-plaintext,
differenential analysis, etc. etc. all have the major goal of
recovering the "key" (or at least gain some knowledge of it), so as to
extend the volume of informations that the analyst already has (the
plaintext he already knows of a given (genuine) message, "if" any).
Already in classical crypto, one tried to attack a given encrypted
message by exploiting what one happened to know of what might be in it.
But excepting this type of weakness (cribs etc. being exploited), the
appropriate defense of which should be separately discussed in my
humble view, the employment of a message-unique key avoids, as far as
I can see, to supply stuffs, with which the analyst could use to obtain
the key of a (genuine user) message, to the analyst in a "fundamental"
sense, for both block and stream encryptions and at reasonalble
tolerable costs. (See also the thread "On the general benefits of
introducing dynamics into encryption processing" of 01.05.2010.) Thus
I don't quite understand e.g. a recent effort of organizing a
conference on linear analysis. (I wrote to the organizer, but got no
reply, understandably.)

M. K. Shen
From: Francois Grieu on
I wrote:
> In a post of 03/06/2010 05:09 down this thread, Keith F. Lynch
> wrote something similar to (freely edited)
>> > "stirring" consists of
>> > repeatedly reversing substrings within the plaintext. For each of the
>> > 256 possible characters, one at a time, in order, I use that character
>> > as a delimiter, breaking up the text into some number of substrings.
>> > Each of those substrings I reverse in place. I do so circularly, so
>> > that the first and last characters also get moved. For instance:
>> >
>> > Criticism of a proposed stream cipher requested.
>> >
>> > would get turned into:
>> >
>> > equed a opriseream phestipof red.Cr strosm cicit
>> >
>> > My complete system consists of:
>> >
>> > * Generate S, the XORed together key-defined subset of the square
>> > roots of the first hundred primes, as many binary digits of each
>> > as there are bits in the plaintext file.
>> > * XOR the plaintext with S.
>> > * "Stir" the result.
>> > * XOR this new result with a "stirred" copy of S.
>> > * Reverse-stir the latest result (i.e. start at the end of the
>> > 256-character "alphabet" and work backwards).
>> > * XOR the latest result with S.
>> >
>> > One feature it has is that it's the same forwards and backwards:
>> > I don't have to specify whether I'm encrypting or decrypting.
>> >
>> > The plaintext is typically a compressed file of English ASCII text,
>> > compressed using either gzip or bzip2.
> Keith F. Lynch later made clear that the same 100-bit key (thus the
> keystream S) is reused for different plaintexts/files.

Stirring does not change the XOR of all bytes. Thus, unless I err, given
two fully known plaintext of size u and v bytes, the xor of all bytes at
indexes u to v-1 in the keystream can be computed as the XOR of the two
plaintext and two ciphertexts.

With that noticed, given approximately 14 fully known plaintext of
different length encrypted with the same key, the key can be recovered
by Gaussian elimination and without guesswork (slightly fewer files are
necessary if one is willing to perform some amount of key search).

Very practical indeed. Attacks only get better.

Francois Grieu
From: Francois Grieu on
[reposted with enhancements, sorry about that happening all too often]

I wrote :
> In a post of 03/06/2010 05:09 down this thread, Keith F. Lynch
> wrote something similar to (freely edited)
>> > "stirring" consists of
>> > repeatedly reversing substrings within the plaintext. For each of the
>> > 256 possible characters, one at a time, in order, I use that character
>> > as a delimiter, breaking up the text into some number of substrings.
>> > Each of those substrings I reverse in place. I do so circularly, so
>> > that the first and last characters also get moved. For instance:
>> >
>> > Criticism of a proposed stream cipher requested.
>> >
>> > would get turned into:
>> >
>> > equed a opriseream phestipof red.Cr strosm cicit
>> >
>> > My complete system consists of:
>> >
>> > * Generate S, the XORed together key-defined subset of the square
>> > roots of the first hundred primes, as many binary digits of each
>> > as there are bits in the plaintext file.
>> > * XOR the plaintext with S.
>> > * "Stir" the result.
>> > * XOR this new result with a "stirred" copy of S.
>> > * Reverse-stir the latest result (i.e. start at the end of the
>> > 256-character "alphabet" and work backwards).
>> > * XOR the latest result with S.
>> >
>> > One feature it has is that it's the same forwards and backwards:
>> > I don't have to specify whether I'm encrypting or decrypting.
>> >
>> > The plaintext is typically a compressed file of English ASCII text,
>> > compressed using either gzip or bzip2.
> Keith F. Lynch later made clear that the same 100-bit key (thus the
> keystream S) is reused for different plaintexts/files.

The stirring operation leaves the the XOR of all bytes invariant. It
follows that the XOR of all bytes of the ciphertext, plaintext, and
keystream S, is sero.

Therefore, given two fully known plaintext of size u and v bytes, the
xor of all bytes at indexes u to v-1 in the keystream S can be computed
as the XOR of all the bytes in the two plaintext and two ciphertexts.

With that noticed, given approximately 14 fully known plaintext of
different length encrypted with the same key, the key can be recovered
by Gaussian elimination, without guesswork (slightly fewer files are
necessary if one is willing to perform some amount of key search).

That's a practical break of the cryptosystem under a known-plaintext
assumption. Attacks only get better.

Francois Grieu