From: Gregory Ewing on
Given some known data/crc pairs, how feasible is it to
figure out the polynomial being used to generate the crc?

In the case I'm looking at, it appears that the crc
size may be at least 24 bits, so just trying all possible
polynomials probably isn't doable.

An article I found hints at the possibility of using
GCDs to make the search more efficient, but doesn't go
into any details. Anyone know of any literature about
this?

If it helps, I have the ability to generate test cases
with known message contents to some extent, although
I don't have complete control over the contents. Also
it's a manual process, so generating large numbers of
them automatically isn't an option.

--
Greg
From: Steven D'Aprano on
On Mon, 08 Mar 2010 16:09:12 +1300, Gregory Ewing wrote:

> Given some known data/crc pairs, how feasible is it to figure out the
> polynomial being used to generate the crc?

Can you just ask the application developer what CRC is being used? Or
look at the source code? Disassemble the binary?


> In the case I'm looking at, it appears that the crc size may be at least
> 24 bits, so just trying all possible polynomials probably isn't doable.

"At least"? Can't you tell by looking at them?

A good place to start is here:

http://en.wikipedia.org/wiki/Cyclic_redundancy_check
http://en.wikipedia.org/wiki/List_of_checksum_algorithms

You can at least eliminate known CRCs. There doesn't appear to be any 24-
bit CRCs in common use, and very few other 24-bit checksums either, so
you're probably looking at a custom CRC.


> An article I found hints at the possibility of using GCDs to make the
> search more efficient, but doesn't go into any details. Anyone know of
> any literature about this?

If you're reduced to trying to determine the polynomial from a sample of
checksums, that's a curve fitting problem. There are various ways to fit
polynomials through a set of known points, but as far as I know none of
them are designed for reverse-engineering CRCs.

You may be able to find something about curve-fitting using discrete
maths (i.e. where all values are limited to integers) or with constraints.

You should probably take this to somewhere like sci.crypt. Here's a
thread from a few years back which might give you some hints:

http://www.derkeiler.com/Newsgroups/sci.crypt/2008-07/msg00035.html

Or here:

http://stackoverflow.com/questions/401231/determining-crc-algorithm-from-
data-crc-embedded-application


--
Steven
From: Steven D'Aprano on
On Mon, 08 Mar 2010 16:09:12 +1300, Gregory Ewing wrote:

> Given some known data/crc pairs, how feasible is it to figure out the
> polynomial being used to generate the crc?

Google is your friend:

http://www.woodmann.com/fravia/crctut1.htm



--
Steven
From: Dave Angel on
Steven D'Aprano wrote:
> On Mon, 08 Mar 2010 16:09:12 +1300, Gregory Ewing wrote:
>
>
>> Given some known data/crc pairs, how feasible is it to figure out the
>> polynomial being used to generate the crc?
>>
>
> Google is your friend:
>
> http://www.woodmann.com/fravia/crctut1.htm
>
>
That page was interesting to read, especially since I've implemented the
three algorithms - CRC16, CRC32, and the reversed version of CRC16, all
in the long-distant past.

However, if there's anything in there about how to derive the polynomial
algorithm from (a few) samples I missed it entirely. Instead, what it
calls reverse engineering is figuring out how to modify a message to
force it to have a desired CRC value (when the CRC polynomial is already
known).

DaveA

From: Gregory Ewing on
Steven D'Aprano wrote:

> Can you just ask the application developer what CRC is being used? Or
> look at the source code? Disassemble the binary?

There's no source, and the binary is enormous. I could ask,
but I wouldn't hold out much hope of them being willing to
tell me.

>>it appears that the crc size may be at least
>>24 bits, so just trying all possible polynomials probably isn't doable.
>
> "At least"? Can't you tell by looking at them?

It's not entirely clear exactly which bytes are part of the
CRC. There are 3 adjacent bytes in the header of the file
that change when I modify the contents, which led me to
think it was a 24-bit CRC. But I now believe that one of
them is not part of the CRC, and it's actually 16 bits.

Using pycrc, I've now tried all possible 16-bit polynomials,
with various combinations of bit and byte reversal, but I
haven't found one that works consistently, so I'm wondering
whether it's using some non-standard algorithm.

--
Greg