Call for review: Hashing by hand algorithm [Cryptography]

Prev: Dynamic Hill cipher
Next: PE Scrambler

From: bmearns on 23 Apr 2010 14:49

I'm in the process of developing a hashing algorithm which I would
very much like to get some feedback on. The primary design criteria is
that it be reasonable to perform by hand. This was the sole purpose
for developing the algorithm, since there are already a number of good
hashing algorithms available for use on computers.

The motivation is for use with passwords (in a scheme I've mentioned
before), where a single strong "core" password is hashed with a
service-specific string in order to produce different strong passwords
for different services, without having one compromise result in the
compromise of the core password. Ideally, the hashing is done by a
computer. However, in the case where you are, for instance, logging on
to a computer with none others available, it must be able to be
performed by hand as well.

That being the motivation, irreversibility is a very high priority.
Collision resistance is also important, but slightly less so.

I designed the algorithm to be performed with a deck of cards (a la
Solitaire), which makes it easy to move elements around. The algorithm
is as follows:

1) A deck of 52 cards is divided into its four suits (heart, clubs,
diamonds, and spades).
2) Each card has a numerical value in [0, 25] (each number is
represented by two different cards in the deck). The red cards are all
in [0, 12], where the ace is 0, the two is 1, queen is 11, king is 12,
etc. The black cards likewise make up the values in [13, 25].
3) The deck is laid out into two rows of 26 cards each. Hearts and
clubs make up the top row (T), diamonds and spades make up the bottom
row (B). Each row is laid out in numerical order, viz.: Ah, 2h,
3h, ..., Qh, Kh, Ac, 2c, 3c, ..., Kc.
4) Data is accepted as a base-26 digit. I.e., every value that is fed
in should be in [0, 25].
5) To feed a value, D, to the hash, start by finding the card with the
same value in the bottom row. Call this position I.
6) Pick up the card at position I in the /top row/. This is the
"selected card", S.
7) Loop over all 26 columns with a counter J, starting at the far
right (J=25) and moving left (J=0), skipping the column where J=I. For
each column:
7.a) If the selected card (S) and the bottom card (B[J]) are
different colors, swap the bottom card with the first card to the
right that has the opposite color (also in the bottom row). If you get
to the end of the row, just wrap back around to the left side.
7.b) Looking at the selected card (S), the top card (T[J]), and
the card that was in the bottom row before the last step (B[J], but
before it was swapped if it was), if there are an even number of red
cards (zero or two red cards), then swap the selected card with the
top card (T[J] will become the new selected card, and the old selected
card will take it's place in the top row).
8) After processing all the columns this way, you will have one card
left in your hand, and there will be an empty slot at T[I]. Place this
card in that slot.
9) Repeat step 7, but reversing the top and bottom row.

A python implementation is included below for clarity.

The deck itself would be the output "digest". Really, this is just two
permutations of 26 elements each (each row is a permutation). There's
a number of ways you might use this, for instance, adding the two
cards in each column to get a sequence of 26 upper-and-lower-case
characters. For a full deck of 52 cards, each row can have 26!
different arrangements, which is about 88 bits, so the whole deck
would be 176 bits.

I've done some preliminary tests with smaller decks (8, 10, and 12
card decks). For each test, I create a random 10 character string
(about the length of a reasonably password), and hash it. As a test
sequence, I perform a number of tests equal to the total number of
possible outputs. My understanding is that a perfect hash would
produce every possible output exactly once as a result.

I think the algorithm avalanches well. Looking at the number of bits
that change with each value fed into the hash, the histogram makes a
pretty decent bell with an average between 48 and 50 bits per hundred,
and a stddev of 13 to 14 per hundred (so about 68% of the time,
between 34% and 62% of the bits change).

I'm not sure about collision resistance, though. In the test sequence
described above (in which I would expect a perfect hash to generate
each output exactly once) only about 44% of possible outputs are being
produced at all. For a 10-card deck (14400 possible outputs) the most
common output occurred 16 times out of 14400 tests. Is this a
significant bias? The average number of times each output occurs
(including those that occurred 0 times) is consistently 1, with a
stddev of 1.527 (but I'm not sure how relevant the stddev is since
it's not remotely a normal distribution).

Visually inspecting the histogram (how many times each output occurs),
it looks pretty random to me. There are some spikes and holes, but
they are pretty well distributed within the histogram and for
different test sequences (where the PRNG producing the input strings
is seeded differently) they seem to occur in different places. The
exception is a few strange wells of about 50 consecutive output values
that never occurred. There are about 12 of these wells that occur in
two groups of 6 each, always in about the same location. Altogether,
they make up about 5% of the possible outcomes.

I'm not sure if these are appropriate tests to be running, or if I'm
interpreting the results the right way, or if there are other tests I
should run. Any feedback on that would be helpful.

Most importantly, I'm hoping to get some feedback on whether or not
anyone sees any theoretical problems in the algorithm. Specifically,
I'm very interested in whether or not it is reversible. I believe that
if you don't know the previous state of the deck, then you would not
be able to tell which cards were swapped and therefore would not know
what data was fed in. Can anyone offer arguments for or against this?
Is this sufficient for irreversibility?

Thanks for taking a look. I greatly appreciate feedback.
-Brian

############# PYTHON IMPLEMENTATION ############
def feed_char(c, b, t):
"""" c is the character, b is the bottom row, t is the top row.
"""

handhash_oneside(c, b, t)
handhash_oneside(c, t, b)

def color(c, count):
half_count = int(count / 2)
return int(c / half_count

def handhash_oneside(c, b, t):
count = len(b)

#Make sure it's in range
c = c % count

for i in range(count):
if b[i] == c:
break

s = t[i]
for j in range(count-1, -1, -1):
if i == j:
continue
col_s = color(s, count)
col_t = color(t[j], count)
col_b = color(b[j], count)

#Colors should all be 50/50. So this XOR should be 50/50.
if col_b ^ col_s:
#Swap b[j] with the next of the opposite color
for offset in range(count):
idx = (offset+j) % count
if col_b != color(b[idx], count):
temp = b[j]
b[j] = b[idx]
b[idx] = temp
break

#black is 1, so the XOR_3 will be false iff there are an even
number of blacks, so the
# test will pass if there are an odd number of blacks. That
corresponds to an even number
# of reds.
if not (col_b ^ col_s ^ col_t):
#Trade selected with top
temp = t[j]
t[j] = s
s = temp

debug = False
if debug:
str = ""
for x in range(j):
str += " "
str += " *"
print ""
print str
printDeck(b, t)

#Put (current) selected back in the open spot
t[i] = s

From: WTShaw on 25 Apr 2010 14:05

On Apr 23, 1:49 pm, bmearns <mearn...(a)gmail.com> wrote:
> I'm in the process of developing a hashing algorithm which I would
> very much like to get some feedback on. The primary design criteria is
> that it be reasonable to perform by hand. This was the sole purpose
> for developing the algorithm, since there are already a number of good
> hashing algorithms available for use on computers.
>
> The motivation is for use with passwords (in a scheme I've mentioned
> before), where a single strong "core" password is hashed with a
> service-specific string in order to produce different strong passwords
> for different services, without having one compromise result in the
> compromise of the core password. Ideally, the hashing is done by a
> computer. However, in the case where you are, for instance, logging on
> to a computer with none others available, it must be able to be
> performed by hand as well.
>
> That being the motivation, irreversibility is a very high priority.
> Collision resistance is also important, but slightly less so.
>
> I designed the algorithm to be performed with a deck of cards (a la
> Solitaire), which makes it easy to move elements around. The algorithm
> is as follows:
>
> 1) A deck of 52 cards is divided into its four suits (heart, clubs,
> diamonds, and spades).
> 2) Each card has a numerical value in [0, 25] (each number is
> represented by two different cards in the deck). The red cards are all
> in [0, 12], where the ace is 0, the two is 1, queen is 11, king is 12,
> etc. The black cards likewise make up the values in [13, 25].
> 3) The deck is laid out into two rows of 26 cards each. Hearts and
> clubs make up the top row (T), diamonds and spades make up the bottom
> row (B). Each row is laid out in numerical order, viz.: Ah, 2h,
> 3h, ..., Qh, Kh, Ac, 2c, 3c, ..., Kc.
> 4) Data is accepted as a base-26 digit. I.e., every value that is fed
> in should be in [0, 25].
> 5) To feed a value, D, to the hash, start by finding the card with the
> same value in the bottom row. Call this position I.
> 6) Pick up the card at position I in the /top row/. This is the
> "selected card", S.
> 7) Loop over all 26 columns with a counter J, starting at the far
> right (J=25) and moving left (J=0), skipping the column where J=I. For
> each column:
> 7.a) If the selected card (S) and the bottom card (B[J]) are
> different colors, swap the bottom card with the first card to the
> right that has the opposite color (also in the bottom row). If you get
> to the end of the row, just wrap back around to the left side.
> 7.b) Looking at the selected card (S), the top card (T[J]), and
> the card that was in the bottom row before the last step (B[J], but
> before it was swapped if it was), if there are an even number of red
> cards (zero or two red cards), then swap the selected card with the
> top card (T[J] will become the new selected card, and the old selected
> card will take it's place in the top row).
> 8) After processing all the columns this way, you will have one card
> left in your hand, and there will be an empty slot at T[I]. Place this
> card in that slot.
> 9) Repeat step 7, but reversing the top and bottom row.
>
> A python implementation is included below for clarity.
>
> The deck itself would be the output "digest". Really, this is just two
> permutations of 26 elements each (each row is a permutation). There's
> a number of ways you might use this, for instance, adding the two
> cards in each column to get a sequence of 26 upper-and-lower-case
> characters. For a full deck of 52 cards, each row can have 26!
> different arrangements, which is about 88 bits, so the whole deck
> would be 176 bits.
>
> I've done some preliminary tests with smaller decks (8, 10, and 12
> card decks). For each test, I create a random 10 character string
> (about the length of a reasonably password), and hash it. As a test
> sequence, I perform a number of tests equal to the total number of
> possible outputs. My understanding is that a perfect hash would
> produce every possible output exactly once as a result.
>
> I think the algorithm avalanches well. Looking at the number of bits
> that change with each value fed into the hash, the histogram makes a
> pretty decent bell with an average between 48 and 50 bits per hundred,
> and a stddev of 13 to 14 per hundred (so about 68% of the time,
> between 34% and 62% of the bits change).
>
> I'm not sure about collision resistance, though. In the test sequence
> described above (in which I would expect a perfect hash to generate
> each output exactly once) only about 44% of possible outputs are being
> produced at all. For a 10-card deck (14400 possible outputs) the most
> common output occurred 16 times out of 14400 tests. Is this a
> significant bias? The average number of times each output occurs
> (including those that occurred 0 times) is consistently 1, with a
> stddev of 1.527 (but I'm not sure how relevant the stddev is since
> it's not remotely a normal distribution).
>
> Visually inspecting the histogram (how many times each output occurs),
> it looks pretty random to me. There are some spikes and holes, but
> they are pretty well distributed within the histogram and for
> different test sequences (where the PRNG producing the input strings
> is seeded differently) they seem to occur in different places. The
> exception is a few strange wells of about 50 consecutive output values
> that never occurred. There are about 12 of these wells that occur in
> two groups of 6 each, always in about the same location. Altogether,
> they make up about 5% of the possible outcomes.
>
> I'm not sure if these are appropriate tests to be running, or if I'm
> interpreting the results the right way, or if there are other tests I
> should run. Any feedback on that would be helpful.
>
> Most importantly, I'm hoping to get some feedback on whether or not
> anyone sees any theoretical problems in the algorithm. Specifically,
> I'm very interested in whether or not it is reversible. I believe that
> if you don't know the previous state of the deck, then you would not
> be able to tell which cards were swapped and therefore would not know
> what data was fed in. Can anyone offer arguments for or against this?
> Is this sufficient for irreversibility?
>
> Thanks for taking a look. I greatly appreciate feedback.
> -Brian
>
> ############# PYTHON IMPLEMENTATION ############
> def feed_char(c, b, t):
> """" c is the character, b is the bottom row, t is the top row.
> """
>
> handhash_oneside(c, b, t)
> handhash_oneside(c, t, b)
>
> def color(c, count):
> half_count = int(count / 2)
> return int(c / half_count
>
> def handhash_oneside(c, b, t):
> count = len(b)
>
> #Make sure it's in range
> c = c % count
>
> for i in range(count):
> if b[i] == c:
> break
>
> s = t[i]
> for j in range(count-1, -1, -1):
> if i == j:
> continue
> col_s = color(s, count)
> col_t = color(t[j], count)
> col_b = color(b[j], count)
>
> #Colors should all be 50/50. So this XOR should be 50/50.
> if col_b ^ col_s:
> #Swap b[j] with the next of the opposite color
> for offset in range(count):
> idx = (offset+j) % count
> if col_b != color(b[idx], count):
> temp = b[j]
> b[j] = b[idx]
> b[idx] = temp
> break
>
> #black is 1, so the XOR_3 will be false iff there are an even
> number of blacks, so the
> # test will pass if there are an odd number of blacks. That
> corresponds to an even number
> # of reds.
> if not (col_b ^ col_s ^ col_t):
> #Trade selected with top
> temp = t[j]
> t[j] = s
> s = temp
>
> debug = False
> if debug:
> str = ""
> for x in range(j):
> str += " "
> str += " *"
> print ""
> print str
> printDeck(b, t)
>
> #Put (current) selected back in the open spot
> t[i] = s

Perhaps it is not so simple, but you probably don't want my opinion.
We I saw this post, I was amused because during a thunderstorm the
other night the computers were safely down and I did not want to waste
batteries. To confirm a counted hash, I had done it hours before by
hand, not so easy as I have trouble writing.

The hash is doubly processed and the result two steps away from the
normal alphabetic sequence is
xjzlacokyungphfsbreiwmvdtq.

I doubt that you can do much with it but the lessons are clear, error
prone if you don't have a way to prove the results. worse to remember
than words, and anything written dow is evidence that can be taken one
way or the other. My words went topside that keys as memories could
not be forced, at least before and after Bush's rubber hose, etc.
guys.

More words that 26 characters? Use a bigger set like Base 64...ever
heard of that one. You are correct that as a substituted cipher it
would not be difficult to solve as the profile of text is the same as
the coded:

Dh
DhIcqS
DhIcqS
DhIcqS
DhIcqSRu
DhIcqSRu
DhIcqSRuZJ
DhIcqSRuZJ
DhIcqSRuZJ71
DhIcqSRuZJ713
DhIcqSRuZJ713zG2=
DhIcqSRuZJ713zG2=m
DhIcqSRuZJ713zG2=mMOrHT
DhIcqSRuZJ713zG2=mMOrHTbgsP
DhIcqSRuZJ713zG2=mMOrHTbgsPUX0Fdv
DhIcqSRuZJ713zG2=mMOrHTbgsPUX0Fdv8BKp
DhIcqSRuZJ713zG2=mMOrHTbgsPUX0Fdv8BKpCj5Yx4E
DhIcqSRuZJ713zG2=mMOrHTbgsPUX0Fdv8BKpCj5Yx4Eo6i
DhIcqSRuZJ713zG2=mMOrHTbgsPUX0Fdv8BKpCj5Yx4Eo6iNw9AV
DhIcqSRuZJ713zG2=mMOrHTbgsPUX0Fdv8BKpCj5Yx4Eo6iNw9AV/aLey

and, many characters were not used. But as a set to hash words, fine.

Using the same source text as the above hash, it would be courtesy of
Large Pizza

9Y3kTZwQKl6LAmsIaUEBxFbcyt+d4nVezRMo7NCpuJfWGD0Hgh1v/i5qXj2SOr8P

Even as one pass was used here, you don't want to do it by hand.

From: Maaartin on 25 Apr 2010 17:09

On Apr 23, 8:49 pm, bmearns <mearn...(a)gmail.com> wrote:
> The deck itself would be the output "digest". Really, this is just two
> permutations of 26 elements each (each row is a permutation). There's
> a number of ways you might use this, for instance, adding the two
> cards in each column to get a sequence of 26 upper-and-lower-case
> characters. For a full deck of 52 cards, each row can have 26!
> different arrangements, which is about 88 bits, so the whole deck
> would be 176 bits.

That's true for the information contained there, but you by the column-
wise addition loose something, don't you?

> I've done some preliminary tests with smaller decks (8, 10, and 12
> card decks). For each test, I create a random 10 character string
> (about the length of a reasonably password), and hash it. As a test
> sequence, I perform a number of tests equal to the total number of
> possible outputs. My understanding is that a perfect hash would
> produce every possible output exactly once as a result.

IMHO, that's quite unprobable, unless you do something completelly
different from what I imagine.

> I think the algorithm avalanches well. Looking at the number of bits
> that change with each value fed into the hash, the histogram makes a
> pretty decent bell with an average between 48 and 50 bits per hundred,
> and a stddev of 13 to 14 per hundred (so about 68% of the time,
> between 34% and 62% of the bits change).

How do you extract the bits from the card sequence? I see no obvious
way.

In case you do it by a sort of factoradic number system, than counting
changed bits makes IMHO no sense, do you?

> I'm not sure about collision resistance, though. In the test sequence
> described above (in which I would expect a perfect hash to generate
> each output exactly once) only about 44% of possible outputs are being
> produced at all. For a 10-card deck (14400 possible outputs) the most
> common output occurred 16 times out of 14400 tests. Is this a
> significant bias? The average number of times each output occurs
> (including those that occurred 0 times) is consistently 1,

By definition.

> with a
> stddev of 1.527 (but I'm not sure how relevant the stddev is since
> it's not remotely a normal distribution).

Standard deviation is important for any distibution. Yours may be
something like
http://en.wikipedia.org/wiki/Poisson_distribution#Properties
You could do some "Hypothesis testing". Or you could first compare it
to randomly generated result, so you get some feeling. Just generate
14400 random numbers from the set 0..14400-1 and compute the stats.
Use a good PRNG and repeat it many times.

> Visually inspecting the histogram (how many times each output occurs),
> it looks pretty random to me. There are some spikes and holes, but
> they are pretty well distributed within the histogram and for
> different test sequences (where the PRNG producing the input strings
> is seeded differently) they seem to occur in different places. The
> exception is a few strange wells of about 50 consecutive output values
> that never occurred. There are about 12 of these wells that occur in
> two groups of 6 each, always in about the same location. Altogether,
> they make up about 5% of the possible outcomes.
>
> I'm not sure if these are appropriate tests to be running, or if I'm
> interpreting the results the right way, or if there are other tests I
> should run. Any feedback on that would be helpful.

I know only test for random sequences: google for "diehard battery",
"ecuyer Test", and "SP800-22rev1.pdf".
There're many possibilities for generating them from hashes, e.g,
h(0), h(1), h(2), ...
h(0), h(h(0)), h(h(h(0)))

> Most importantly, I'm hoping to get some feedback on whether or not
> anyone sees any theoretical problems in the algorithm. Specifically,
> I'm very interested in whether or not it is reversible. I believe that
> if you don't know the previous state of the deck, then you would not
> be able to tell which cards were swapped and therefore would not know
> what data was fed in. Can anyone offer arguments for or against this?
> Is this sufficient for irreversibility?

I can't tell.

From: WTShaw on 26 Apr 2010 00:57

On Apr 25, 4:09 pm, Maaartin <grajc...(a)seznam.cz> wrote:
> On Apr 23, 8:49 pm, bmearns <mearn...(a)gmail.com> wrote:
>
> > The deck itself would be the output "digest". Really, this is just two
> > permutations of 26 elements each (each row is a permutation). There's
> > a number of ways you might use this, for instance, adding the two
> > cards in each column to get a sequence of 26 upper-and-lower-case
> > characters. For a full deck of 52 cards, each row can have 26!
> > different arrangements, which is about 88 bits, so the whole deck
> > would be 176 bits.
>
> That's true for the information contained there, but you by the column-
> wise addition loose something, don't you?
>
> > I've done some preliminary tests with smaller decks (8, 10, and 12
> > card decks). For each test, I create a random 10 character string
> > (about the length of a reasonably password), and hash it. As a test
> > sequence, I perform a number of tests equal to the total number of
> > possible outputs. My understanding is that a perfect hash would
> > produce every possible output exactly once as a result.
>
> IMHO, that's quite unprobable, unless you do something completelly
> different from what I imagine.
>
> > I think the algorithm avalanches well. Looking at the number of bits
> > that change with each value fed into the hash, the histogram makes a
> > pretty decent bell with an average between 48 and 50 bits per hundred,
> > and a stddev of 13 to 14 per hundred (so about 68% of the time,
> > between 34% and 62% of the bits change).
>
> How do you extract the bits from the card sequence? I see no obvious
> way.
>
> In case you do it by a sort of factoradic number system, than counting
> changed bits makes IMHO no sense, do you?
>
> > I'm not sure about collision resistance, though. In the test sequence
> > described above (in which I would expect a perfect hash to generate
> > each output exactly once) only about 44% of possible outputs are being
> > produced at all. For a 10-card deck (14400 possible outputs) the most
> > common output occurred 16 times out of 14400 tests. Is this a
> > significant bias? The average number of times each output occurs
> > (including those that occurred 0 times) is consistently 1,
>
> By definition.
>
> > with a
> > stddev of 1.527 (but I'm not sure how relevant the stddev is since
> > it's not remotely a normal distribution).
>
> Standard deviation is important for any distibution. Yours may be
> something likehttp://en.wikipedia.org/wiki/Poisson_distribution#Properties
> You could do some "Hypothesis testing". Or you could first compare it
> to randomly generated result, so you get some feeling. Just generate
> 14400 random numbers from the set 0..14400-1 and compute the stats.
> Use a good PRNG and repeat it many times.
>
> > Visually inspecting the histogram (how many times each output occurs),
> > it looks pretty random to me. There are some spikes and holes, but
> > they are pretty well distributed within the histogram and for
> > different test sequences (where the PRNG producing the input strings
> > is seeded differently) they seem to occur in different places. The
> > exception is a few strange wells of about 50 consecutive output values
> > that never occurred. There are about 12 of these wells that occur in
> > two groups of 6 each, always in about the same location. Altogether,
> > they make up about 5% of the possible outcomes.
>
> > I'm not sure if these are appropriate tests to be running, or if I'm
> > interpreting the results the right way, or if there are other tests I
> > should run. Any feedback on that would be helpful.
>
> I know only test for random sequences: google for "diehard battery",
> "ecuyer Test", and "SP800-22rev1.pdf".
> There're many possibilities for generating them from hashes, e.g,
> h(0), h(1), h(2), ...
> h(0), h(h(0)), h(h(h(0)))
>
> > Most importantly, I'm hoping to get some feedback on whether or not
> > anyone sees any theoretical problems in the algorithm. Specifically,
> > I'm very interested in whether or not it is reversible. I believe that
> > if you don't know the previous state of the deck, then you would not
> > be able to tell which cards were swapped and therefore would not know
> > what data was fed in. Can anyone offer arguments for or against this?
> > Is this sufficient for irreversibility?
>
> I can't tell.

From: WTShaw on 26 Apr 2010 01:09

On Apr 25, 1:05 pm, WTShaw <lure...(a)gmail.com> wrote:

> xjzlacokyungphfsbreiwmvdtq.

>
> 9Y3kTZwQKl6LAmsIaUEBxFbcyt+d4nVezRMo7NCpuJfWGD0Hgh1v/i5qXj2SOr8P
>
> Even as one pass was used here, you don't want to do it by hand.

I offered you a little hashing challenge where as two strings are
based on the same source.

As the first shorter sequence is likely secure alone, the longer
sequence is not, really rather a poor situation, but as the two have a
common root, solve one, solve the other. In practice, you never get
this kind of information so your are are pretty much blind to some use
of this process but knowing what to do with what you have been given,
solving it, is important to understanding what is good or bad about
such permutations as hashes, and why.

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Dynamic Hill cipher
Next: PE Scrambler