From: kanze on
Thomas J. Gritzan wrote:
> gkoreman(a)gmail.com schrieb:

> > I am trying to implement SHA1 based on the pseudo-code on
> > Wikipedia. The pseudo-code is on:
> > http://en.wikipedia.org/wiki/SHA-1

> > My code is working, but is not giving me the correct hashes.
> > This is being tested (initially) on a big-endian machine,
> > and only after I have it working on big-endian was I
> > planning on making it work on both big and little endian.

> [...]
> > // Step 1
> > word32 padding = 0x8000000; // 1000 0000 0000 0000
> > message.push_back(padding);
> [...]

> You forget a zero. Should be:

> word32 padding = 0x80000000; // or: 1 << 31;

> Works for me on a little-endian machine (only for zero-length messages).

Only for zero-length messages. Of course, his interface only
allows for message lengths which are a multiple of 32 bits, so
this might be OK.

And how does the endian-ness make a difference. I implemented
MD5 on a Sparc (big-endian) -- I've since compiled the code on a
Linux based PC (little-endian), and it works perfectly well. Is
there something fundamentally different between MD5 and SHA-1
which means that SHA-1 is endian dependant? They look very
similar in priciples to me.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: kanze on
Le Chaud Lapin wrote:
> gkoreman(a)gmail.com wrote:
> > I am trying to implement SHA1 based on the pseudo-code on Wikipedia.
> > // Pre-processing:
> > // 1. append a single "1" bit to message
> > // 2. append "0" bits until message length = 490 = -32
> > (mod 512)
> > // 3. append length of message (before pre-processing),
> > in bits as 32-bit big-endian integer to message

> You are not allowed to change the size of the 64-bit appendage
> or any other part of the alogrithm. The specification calls
> for padding until the length is congruent to 448 mod 512. It
> looks as if this is the source of your incorrect digest.

> >From Specification:

> "append a single "1" bit to message append "0" bits until
> message length = 448 = -64 (mod 512) append length of message
> (before pre-processing), in bits as 64-bit big-endian integer
> to message"

> > word32 messageLength = message.size() * 32;
> >
> > // Step 1
> > word32 padding = 0x8000000; // 1000 0000 0000 0000
> > message.push_back(padding);

> Are you sure you are on a big-endian machine? I tested your
> code and it seems that youre machine is little endian.

That shouldn't make a difference, if you've implemnented the
code correctly.

> In any case, again, you have to be fastidious in watching the
> spec in this section. In general, you cannot slap on an
> entire 64-bit chunk onto the message. For example, if the
> message is 440 bits long, you're only allowed to add 1 byte
> for padding: the bit-pattern of a 1 followed by 7 zeros. This
> will bring you to congruency mod 448, at which point you can
> tack on the 64-bit length to get to a full 512 bits.

His interface forced the user to provide the 32 bit chunks.

Technically, of course, you might have to pad as little as a
single bit. In practice, you can pad using the same multiple of
bits as the user is required to provide (supposing that is a
power of 2). Of course, for most uses, you'll want to support
at least as small as bytes.

> > // Step 3
> > message.push_back(messageLength);

> You punter. :)

> Technically, you have to be prepared for crunching a digest
> whose size is greater than (2^32). Here you're only taking
> care of lengths up to 2^32.

Depends on the application. My own implementation of MD5 is
limited to the SIZE_T_MAX * 8 bits. Perhaps internally, there
are other limits which means that the messages will be small.

The problem here isn't what he wants to support. The problem is
that the algorithm says that you append 64 bits of size, and he
is only appending 32. So something will be wrong somewhere.

> NOTES:

> 1. You want to be able to minimize required changes to code
> if you suddendly decide to switch to a different hash
> algorithm, say SHA-256.

You want to get the algorithm working first:-).

> 2. You want the output of the hashing to be an object itself
> so that you can let it construct, assign it, compare it, store
> it in containers, toss it around, etc., let it destruct.

Maybe. In my case, my MD5 calculator is an accumulator. I
suppose that there is some argument for defining a result type,
but in practice, I've almost always read the results using the
<< operator, e.g.:

file << myText
<< std::accumulate( myText.begin(), myText.end(), MD5() ) ;

Internally, the << operator uses a member function:
template< typename OutIter >
void resultsTo( OutIter dest ) const ;
which I made public, more because there wasn't any reason not
too, and I thought it might be useful to someone.

> 3. Loading a sequence of byes into a vector<> just to hash
> might be too slow.

That's why I used the accumulate interface. The first two
parameters to std::accumulate can be any input iterators or
better.

(Because std::accumulate is designed to make things as slow as
possible, I also offer a member function append, which takes two
input iterators -- although the code currently says FwdIter --
and so avoids all of the copying of the object.)

> 4. You want the hash algorithm itself to be fast.

You want it to be correct first. It isn't, so it's not yet time
to worry about speed.

> SUGGESTIONS:

> I would define the algorithm itself as a namespace.

Why?

After a quick glance at SHA-1, I suspect that you could use some
sort of common machine for the entire family, including SHA-1
and MD-5, with a policy class to customize, much as is standard
practice for CRC's. I'd certainly evaluate this route before
implementing my second digest in the family. (As mentionned
above, I've already implemented MD5.)

> Call it SHA_1:

> namespace SHA_1;

> Then you can define other hashing algorithms in their own namespaces:
> namespace MD5;
> namespace Adler32;

> Within your SHA_1 namespace, define a Digest and a Hasher. A
> Digest will contain your 160-bit output. A Digest should also
> be able to yield itself as a std::string (for printing). The
> Hasher should maintain a 64-byte state so that it can be used
> incrementally to hash disjoint data buffers, maintaining
> remnants mod 64 bytes.

> Overload two hashing functions in Hasher - one function for
> incremental hashing, another that does the same thing as the
> first but also takes by reference a Digest to collect the
> final digest and signal to the Hasher that it should reset
> itself to prepare for the next stream to be hashed:

This is IMHO the big question with regards to the interface. In
my current implementation, extracting the value would modify the
internal state, so that the accumulator itself could not be
reused. However, all of the functions which one would normally
use to do this work on a copy, and not the original object, so
the orginal object remains unmodified. This copy isn't free, of
course, but the object was designed to be as cheap to copy as
possible -- all data members are arrays of integral types, so
basically, the copy constructor just does a memcpy. And given
the cost of calculating an MD5 digest anyway, I suspect that it
doesn't matter.

For those interested, the code is actually already on the web,
although the cover pages with the links to it aren't complete
yet. If you don't mind browsing raw code, however, you can look
at it at http://gabisoft.free.fr/code/Util/IO/MD5. It won't be
officially there until I've finished the cover pages (with the
disclaimers, etc.), so this is not an official announcement, and
I'm not guaranteeing anything.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: gkoreman on
Is it working?
No, not working. But it is running.

The link at the top looks very useful. Thank you.

The 64bit size issue: An early implementation decision to limit
complexity. This limits the message size, but this was not a bug.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: gkoreman on
First of all, yes. I am sure this is a big-endian machine.

Also, I am aware that I am severely limiting my message size by only
using 1 word32. This is not a bug, but an early implementation
limitation :P
I didn't want to introduce any errors when making my own word64, so I
just didn't (yet). I wanted to get the main chunk of the SHA-1 algo
working on small hashes before moving up to large ones.

As for the rest of your suggestions, thanks! I will have to look them
over in more detail, and I will probably modify my design.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: gkoreman on
That fixed it! Thank you! I checked the initialization numbers over and
over and didn't find any typos, but I totally missed that one.

And yes, I have only been testing on zero length messages to start. I
figured that I should get zero length messages working before I try to
go any further.

Thank you very much.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5
Prev: C++ Concepts
Next: Portable way to write binary data