SHA1 [C++]

Prev: C++ Concepts
Next: Portable way to write binary data

From: kanze on 23 Nov 2005 08:45

Le Chaud Lapin wrote:
> kanze wrote:
> > And how does the endian-ness make a difference. I
> > implemented MD5 on a Sparc (big-endian) -- I've since
> > compiled the code on a Linux based PC (little-endian), and
> > it works perfectly well. Is there something fundamentally
> > different between MD5 and SHA-1 which means that SHA-1 is
> > endian dependant? They look very similar in priciples to
> > me.

> It definitely makes a difference, but first, let me rant:

You're wasting your time. The arguments are ancient -- when I
started programming, the most widespread machines were the
PDP-11 (little endian) and the IBM 360 (big endian). When 16
bit microprocessors started appearing, the leaders were Intel
(little endian) and Motorola (big endian).

Today, the only consensus that I know is for transmission
protocols, and it is contradictary:

-- Bit order in a byte is *always* little endian. I have never
heard of an exception.

-- Byte order in a larger integral value is normally big
endian, at least for fixed length integers. (BER uses
little endian for variable length integers -- 7 bits per
byte, with the top bit 0 if there are still bytes to come,
and 1 if this is the last byte.)

(Like you, I prefer little endian. But my preferences aren't
going to change anything, and there are a lot more important
issues to be concerned with.)

[...]
> Getting of my rant box, if the input data is 0x01234567,
> that's 4 bytes.

It depends. Is it four bytes (characters or whatever), or is it
a 32 bit value. If it's four bytes, it's four separate bytes
(and should probably be written 01 23 45 67). If it's a 32 bit
value, it's a 32 bit value.

> In a buffer, begining at offset 0, either the 0x01 will be
> there or the 0x67 will be there.

In a buffer, it's in the format I decide that it's in. If it's
off the line, it's in the format of the protocol, automatically.
If it's in memory, to be written, its in the format I wrote it
in.

> When the programmer goes to form a word from this buffer to be
> used in the plethora of Boolean operations of the hash
> algorithm, s/he will have to choose how to extract the bytes
> from this stream to form the word.

If it is a character buffer, and I need words for the SHA or MD5
algorithm, I will, naturally, extract the necessary number of
bytes from the buffer, and construct a word from them according
to the specifications of the protocol. Something like (from my
implementation of MD5) :

GB_uint32_t
getWord(
GB_uint8_t const* source )
{
return source[ 0 ]
| (source[ 1 ] << 8)
| (source[ 2 ] << 16)
| (source[ 3 ] << 24) ;
}

I have to do it this way anyway, of course. There's no
guarantee that my clients character buffer is correctly aligned
for me to view it as a word.

> Note that implicit in all of this is that when the stream of
> bytes move from one machine to another, the bit order is
> preserved for individual bytes, and the byte order is
> preserved for a stream. So if 0x01 is the byte at offset 0
> (big-endian), it will be the byte at offset 0 when it is sent
> as a sequence of bytes. But if a little-endian receiving
> machine gets these 4-bytes, and puts in them in increasing
> memory locations starting at 0, and then extracts the 4 bytes
> as a word, the word will become 0x76543210.

If any machine accesses a stream of bytes as if it were in fact
a stream of words, it's looking for trouble. Data has a format.
I never suppose that the format from an external source just
happens to coincide with the internal format of the machine.
I've encountered at least three different byte orders for 32 bit
values, two of them with different compilers on the same
machine. And on the machines I use today (Sun Sparc), trying to
access a word at a misaligned address will cause a core dump.

> So there are two subtleties here. The first is that bit-order
> is guranteed no matter what from machine to machine.

There is no such thing as bit order within the machine, unless
your machine supports some form of bit addressing. (The PDP-11,
and I think the VAX, did.) There is a consensus that bits will
be sent little endian on the transmission line. The hardware
takes care of assembling them into bytes.

There is also a very real sense that there is no such thing as
byte order in a machine. You can't see it without doing all
sorts of dirty casts. (I know that there are cases where you
cannot avoid such things -- I've implemented a standard C
library, and I did do some dirty casting in functions like modf,
in order to piddle with the bits of the exponant and the
mantissa directly.)

If you want the low order byte of a 32 bit word, you know how to
get it: word & 0xFF (or just assigning to a uint8_t). If you
want the high order byte, it's word >> 24 first. (Note that
this works even on a machine with 9 bit char's. Providing, of
course, that the hardware sends the 8 low order bits when
serializing, which is what I would expect.)

> We take that for granted, because it is. The second is that
> no one ever said that we were allowed to treat the input
> stream as a sequence of words. The most that we are allowed
> is bytes. So during both extraction and implantation of
> words, we have to be careful about byte-sequences.

We have to deal with byte sequences, and we have to convert them
to and from words. Both have a specific format, and can be
accessed within C++ without worrying about that format. There's
never any reason here to write platform dependant code.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: kanze on 23 Nov 2005 09:00

gkoreman(a)gmail.com wrote:
> Is it working?
> No, not working. But it is running.

I understood what you meant. That was just a little attempt at
humor. Still, it's important to keep in mind that the goal
isn't just to get the code through the compiler, or that it runs
without core dumping. The goal is a correct answer for all
legal input, and an error indication for all illegal input. (In
the case of SHA, I think all input can be considered legal.)

> The link at the top looks very useful. Thank you.

> The 64bit size issue: An early implementation decision to
> limit complexity. This limits the message size, but this was
> not a bug.

That wasn't the point. The protocol says that the size is sent
on 64 bits. That means that you *must* put 64 bits of length in
the buffer. The protocol is big endian, so your last 32 bits of
padding are being interpreted as the upper part of the length.
If they are all zero (which will usually be the case), the error
won't be apparent. If the length of the user message is such
that the 0x80 trailer falls in those bits, the digest will be
wrong.

It's perfectly acceptable to restrict the input, and say that
you cannot handle more than 2^32 bytes, or even 2^27 (so that
the multiplication by 32 cannot overflow. But in that case, you
still must insert 64 bits into the stream, even if the top 32
are guaranteed zero.

Generally, it's fairly trivial to accept 2^32 byte. Just assign
the length to a uint64_t (part of the C standard, but often
present -- #include <stdint.h>) before doing anything with it.
And insert "length >> 32' (high word) and 'length & 0xFFFFFFFF'
(low word) into your buffer. I personally wouldn't bother with
more, although in my implementation of MD5, doing so would be
trivial, since I stream the input, processing one block at a
time, rather than requiring the user to pass me the entire
message in one go.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Greg Herlihy on 24 Nov 2005 03:05

kanze wrote:
> Le Chaud Lapin wrote:
> > kanze wrote:
> > > And how does the endian-ness make a difference. I
> > > implemented MD5 on a Sparc (big-endian) -- I've since
> > > compiled the code on a Linux based PC (little-endian), and
> > > it works perfectly well. Is there something fundamentally
> > > different between MD5 and SHA-1 which means that SHA-1 is
> > > endian dependant? They look very similar in priciples to
> > > me.
>
> > It definitely makes a difference, but first, let me rant:
>
> You're wasting your time. The arguments are ancient -- when I
> started programming, the most widespread machines were the
> PDP-11 (little endian) and the IBM 360 (big endian). When 16
> bit microprocessors started appearing, the leaders were Intel
> (little endian) and Motorola (big endian).
>
> Today, the only consensus that I know is for transmission
> protocols, and it is contradictary:
>
> -- Bit order in a byte is *always* little endian. I have never
> heard of an exception.

If big endian has the most significant byte of a multiple byte value
ordered first, wouldn't a big endian byte have its most significant bit
ordered first? If so, wouldn't bit order in a byte always be big
endian, and never little endian?

> [...]
> > Getting of my rant box, if the input data is 0x01234567,
> > that's 4 bytes.
>
> It depends. Is it four bytes (characters or whatever), or is it
> a 32 bit value. If it's four bytes, it's four separate bytes
> (and should probably be written 01 23 45 67). If it's a 32 bit
> value, it's a 32 bit value.

MD5 treats the input as a 32-bit values in 4-byte little endian-format.
SHA1 treats the input as 32-bit values in 4-byte big-endian format (the
actual order of the bytes from the point of view of its author is not
relevant for the purpose of creating a digest for them). If a machine
calculating a digest does not interpret the four input bytes with the
endianness that the algorithm requires, then it must rearrange those
bytes in order that it does. Otherwise, the subsequent additions and
rotations applied to the input value on a big endian machine would
produce a completely different result from the one calculated on a
little-endian machine - because the starting value used in those
calculations was not the same.

> > In a buffer, begining at offset 0, either the 0x01 will be
> > there or the 0x67 will be there.
>
> In a buffer, it's in the format I decide that it's in. If it's
> off the line, it's in the format of the protocol, automatically.
> If it's in memory, to be written, its in the format I wrote it
> in.
>
> > When the programmer goes to form a word from this buffer to be
> > used in the plethora of Boolean operations of the hash
> > algorithm, s/he will have to choose how to extract the bytes
> > from this stream to form the word.
>
> If it is a character buffer, and I need words for the SHA or MD5
> algorithm, I will, naturally, extract the necessary number of
> bytes from the buffer, and construct a word from them according
> to the specifications of the protocol. Something like (from my
> implementation of MD5) :
>
> GB_uint32_t
> getWord(
> GB_uint8_t const* source )
> {
> return source[ 0 ]
> | (source[ 1 ] << 8)
> | (source[ 2 ] << 16)
> | (source[ 3 ] << 24) ;
> }

This above routine is in fact the byte-swapping routine necessary for
big-endian machines calculating an MD5 digest to interpret the four
bytes of input as a 32-bit little-endian word. Since this function is
"endian-neutral" it can also be used on little endian-machines without
ill effect, but to do so is inefficient. A little-endian machine
calculating an MD5 digest would not waste cyles on these operations -
it would simply load the entire four bytes into a register at once.
Considering that big-endian machines can spend as much 1/3 of their
time computing an MD5 digest in swapping bytes, the effect of this
optimization for little-endian machines is likely to be significant. By
the same token, the performance advantage that a big-endian machine
would have over a little-endian machine when calculating SHA-1 (due to
its big-endian biased input), would likely be equally as significant.

Greg

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: kanze on 24 Nov 2005 12:12

Greg Herlihy wrote:
> kanze wrote:
> > Le Chaud Lapin wrote:
> > > kanze wrote:
> > > > And how does the endian-ness make a difference. I
> > > > implemented MD5 on a Sparc (big-endian) -- I've since
> > > > compiled the code on a Linux based PC (little-endian),
> > > > and it works perfectly well. Is there something
> > > > fundamentally different between MD5 and SHA-1 which
> > > > means that SHA-1 is endian dependant? They look very
> > > > similar in priciples to me.

> > > It definitely makes a difference, but first, let me rant:

> > You're wasting your time. The arguments are ancient -- when
> > I started programming, the most widespread machines were the
> > PDP-11 (little endian) and the IBM 360 (big endian). When
> > 16 bit microprocessors started appearing, the leaders were
> > Intel (little endian) and Motorola (big endian).

> > Today, the only consensus that I know is for transmission
> > protocols, and it is contradictary:

> > -- Bit order in a byte is *always* little endian. I have
> > never heard of an exception.

> If big endian has the most significant byte of a multiple byte
> value ordered first, wouldn't a big endian byte have its most
> significant bit ordered first?

That would be logical.

> If so, wouldn't bit order in a byte always be big endian, and
> never little endian?

No.

Within a machine, bytes are accessed all eight bits in parallel;
there isn't any endianness. (The reason why endianness doesn't
make a difference in my own code is because I always access
words as words, i.e. all bits in parallel, where there isn't any
endianness.)

On a serial medium, like a transmission line or a track on a
disk, of course, it's one bit after the other, and you *can*
send them in any order you want. In practice, there is a
consensus with regards to transmission line protocols, which is
to send the bits in order, with the LSB first, and with any
parity trailing, if byte parity is being used. (I once ran into
hardware, some 25 years ago, which inserted a parity bit in the
middle of the byte, between the bits 3 and 4. And PCM requires
block parity, and different protocols insert the bit at various
places within the block, although none that I know of put it in
the middle of a byte, at least not when the entire byte is in a
single block -- the D canal of the lowest level ISDN connection
is sent 2 bits per block, so obviously, there are block parity
bits, as well as bytes for other canals, in the middle of it.)

In practice, this is almost always fully handled by the
hardware. You don't see the bit order; the link between the
serial controler and your CPU is parallel.

> > [...]
> > > Getting of my rant box, if the input data is 0x01234567,
> > > that's 4 bytes.

> > It depends. Is it four bytes (characters or whatever), or
> > is it a 32 bit value. If it's four bytes, it's four
> > separate bytes (and should probably be written 01 23 45 67).
> > If it's a 32 bit value, it's a 32 bit value.

> MD5 treats the input as a 32-bit values in 4-byte little
> endian-format.

Formally, MD5 treates the input as a bit stream. That bit
stream is then processed in 512 bit blocks; within each block,
the bits are processed as 16 32 bit words. In practice, of
course, the input will be a byte stream, with four bytes per 32
bit word.

The only place where bit order in the bytes would be relevant is
if you had to assemble a byte from several different blocks of
bits. This might occur, for example, if your interface allowed
the user to specify an arbitrary number of bits, and you used
only the lower n bits of the last byte. I don't support this,
and I don't know, off hand, what the RFC says about it; my
impression is that it should be big endian (based on the fact
that to insert a single 1 bit, followed by zeros, I use 0x80).

The only place byte order is relevant is when you assemble bytes
into words. Here, the reference implementation (which is part
of the RFC) does use little endian format. Which, to be frank,
seems a little strange in context -- the obvious intent in
processing the bits in the byte is to consider that the high
order bit is inserted into the bit stream first. But it doesn't
really matter; in all cases I know of, the input stream has in
fact been a stream of bytes, and the choice between little
endian and big endian is purely arbitrary here.

> SHA1 treats the input as 32-bit values in 4-byte big-endian
> format (the actual order of the bytes from the point of view
> of its author is not relevant for the purpose of creating a
> digest for them).

It's important that both sides agree.

Again, SHA1 is formally defined over a stream of bits. For the
rest, my comments concerning MD5 above apply, with the exception
that bytes are streamed into 32 bit words in big endian order:
the first byte to arrive is the high order byte of the 32 bit
word, where as in MD5, the first byte to arrive is the low order
byte.

> If a machine calculating a digest does not interpret the four
> input bytes with the endianness that the algorithm requires,
> then it must rearrange those bytes in order that it does.
> Otherwise, the subsequent additions and rotations applied to
> the input value on a big endian machine would produce a
> completely different result from the one calculated on a
> little-endian machine - because the starting value used in
> those calculations was not the same.

Formally, as I said, the algorithm processes a stream of bits,
which are broken up into blocks, and within each block, into 32
bit words. By convention, when given a byte, the algorithm
inserts the high order bit first, with the rest of the bits in
order. If I read SHA1 correctly, when it breaks the stream up
into words, it does so by simply taking 32 successive bits out
of the stream, with the first bit as the high order bit. MD5
inserts the first bit in bit 7 of the word, the second in bit 6,
then goes on to put the ninth in bit 15... In a practical
implementation, of course, both will receive a stream of bytes,
with MD5 using the first byte as the low order byte of its
words, and SHA1 using it as the high order byte.

> > > In a buffer, begining at offset 0, either the 0x01 will be
> > > there or the 0x67 will be there.

> > In a buffer, it's in the format I decide that it's in. If
> > it's off the line, it's in the format of the protocol,
> > automatically. If it's in memory, to be written, its in the
> > format I wrote it in.

> > > When the programmer goes to form a word from this buffer
> > > to be used in the plethora of Boolean operations of the
> > > hash algorithm, s/he will have to choose how to extract
> > > the bytes from this stream to form the word.

> > If it is a character buffer, and I need words for the SHA or
> > MD5 algorithm, I will, naturally, extract the necessary
> > number of bytes from the buffer, and construct a word from
> > them according to the specifications of the protocol.
> > Something like (from my implementation of MD5) :

> > GB_uint32_t
> > getWord(
> > GB_uint8_t const* source )
> > {
> > return source[ 0 ]
> > | (source[ 1 ] << 8)
> > | (source[ 2 ] << 16)
> > | (source[ 3 ] << 24) ;
> > }

> This above routine is in fact the byte-swapping routine
> necessary for big-endian machines calculating an MD5 digest to
> interpret the four bytes of input as a 32-bit little-endian
> word.

The above routine works on the value of a 32 bit word. It
assembles the bytes from the stream into a 32 bit value, which
it writes as a single word, in parallel. There is NO byte order
in the word, because the word is parallel. It works with big
endian byte addressed machines. It works with little endian
byte addressed machines. It works with mixed endian (3412 --
really seen) byte addressed machines. And it works with word
addressed machines. Because it ignores byte order, and works
directly on the 32 bit value.

The fact that the results may look the same as byte swapping on
certain machines is a coincidence. The code doesn't depend on
it in any way.

> Since this function is "endian-neutral" it can also be used on
> little endian-machines without ill effect, but to do so is
> inefficient.

Is it? Depending on the hardware, it may be more efficient than
writing four bytes. At any rate, any differences in efficiency
will hardly be measurable compared to the time actually needed
to calculate the SHA1 or the MD5 digest.

> A little-endian machine calculating an MD5 digest would not
> waste cyles on these operations - it would simply load the
> entire four bytes into a register at once.

And get a segment violation, because they might be mis-aligned.
Not in any code I'd write. When processing a byte stream into
32 bit words, you have to do something like the above at some
point anyway. You cannot simply convert a char* into a
uint32_t*, and expect to be able to dereference it. (Doing so
actually works on an Intel architecture. But at a considerable
run-time cost.)

> Considering that big-endian machines can spend as much 1/3 of
> their time computing an MD5 digest in swapping bytes, the
> effect of this optimization for little-endian machines is
> likely to be significant.

Have you actually profiled this? My MD5 implementation was
orignallly developed on a Sparc, a big endian machine. The
application was time critical, and was extensively prototyped.
And the time spent in the getWord routine, above, was never
measurable.

> By the same token, the performance advantage that a big-endian
> machine would have over a little-endian machine when
> calculating SHA-1 (due to its big-endian biased input), would
> likely be equally as significant.

Probably. The time spent in the getWord function, above, is
certainly insignificant for MD5 on at least one big-endian
machine (a Sun Sparc -- I'm no longer sure which exact model).
I expect that it would also be insignificant for SHA1 on a
little-endian machine.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

First | Prev |
Pages: 1 2 3 4 5
Prev: C++ Concepts
Next: Portable way to write binary data