Search through a (large) binary file. [CSharp]

Prev: 443413 M3i Zero , Ezflash Dsi , R4i Dsi 43531
Next: Can I get the Mime Content Type from a byte array?

From: Michelle on 15 Sep 2009 05:26

Peter,

>[...]
> As far as reading bytes from a specific position in the file, you have to
> set the Position property to the position where you want to read, and
> then you simply read. It's just that simple.

Clear.

> Note that the Decimal numeric type takes up 16 bytes in a file. So if you
> are reading only 4 bytes, then obviously either you are reading the data
> as the wrong format, or not reading enough bytes. Either way, you won't
> get valid results.

Bytes read: 00-C0-22-4F
Decimal value: 00C0224F -> 12591695
It starts always (far as we know now) with 0x00, so actually I only need 3
bytes

It's the way it's used in my file.

> Do you get the values you expect when you do that? If so, then your
> number must not be Decimal in the first place.

Yes, as expected.

> Basically, you've provided no information here that would allow anyone to
> know for sure what the format of the numbers in your file are.
[...]

We're reverse engineering a proprietary file.
As mentioned above i'd need to read 3/4 bytes and convert it to a Decimal
value.

In my earlier postings, I'll assume that I only wanted to search for one
pattern.
But during the reverse engineering we discovered a second one.
I'm now looking for some 'Rabin-Karp algorithm' C# examples.
I think that's a better solution then run the brute-force search twice.
As you can see, the truth is closer today than yesterday ;-))

Best regards,

Michelle

From: Michelle on 15 Sep 2009 10:37

UPDATE

[ . . . ]
> But during the reverse engineering we discovered a second one.
> I'm now looking for some 'Rabin-Karp algorithm' C# examples.
> I think that's a better solution then run the brute-force search twice.

The challenge is even greater. The record header contains two variables.
So the search must take place at two locations with wildcards.

0x07 0x00 0x?? 0x00 0x00 0x00 0x07 0x00 0x?? 0x00 0x00 0x00 0x08 0x00

(We know the possible byte values)

So my only solution is to use Regex ?

Regards,

Michelle

From: Tom Spink on 15 Sep 2009 14:10

Michelle wrote:

> UPDATE
>
> [ . . . ]
>> But during the reverse engineering we discovered a second one.
>> I'm now looking for some 'Rabin-Karp algorithm' C# examples.
>> I think that's a better solution then run the brute-force search twice.
>
> The challenge is even greater. The record header contains two variables.
> So the search must take place at two locations with wildcards.
>
> 0x07 0x00 0x?? 0x00 0x00 0x00 0x07 0x00 0x?? 0x00 0x00 0x00 0x08 0x00
>
> (We know the possible byte values)
>
> So my only solution is to use Regex ?
>
> Regards,
>
> Michelle
>

This sounds like a highly structured file - *surely* there is some sort
of descriptor at the start of it that contains a pointer to these
records.

Presumably there is some software out there to read and write these
files - I doubt very much they do any binary searching. There must be
some kind of allocation table, or header structure that defines the rest
of the file, even a pointer to the first record, which contains a
pointer to the next (in a linked-list style).

When programming becomes this complex, it's usually best to step back
and take another look at the problem.

--
Tom

From: Peter Duniho on 15 Sep 2009 14:10

On Tue, 15 Sep 2009 07:37:15 -0700, Michelle <michelle(a)notvalid.nomail>
wrote:

> UPDATE
>
> [ . . . ]
>> But during the reverse engineering we discovered a second one.
>> I'm now looking for some 'Rabin-Karp algorithm' C# examples.
>> I think that's a better solution then run the brute-force search twice.
>
> The challenge is even greater. The record header contains two variables.
> So the search must take place at two locations with wildcards.
>
> 0x07 0x00 0x?? 0x00 0x00 0x00 0x07 0x00 0x?? 0x00 0x00 0x00 0x08 0x00
>
> (We know the possible byte values)
>
> So my only solution is to use Regex ?

No, not necessarily. You could search for the sub-components
individually. Look for one, then look for the other in the specific place
it should be if you find the first. Though, if Regex makes the code
simpler, it might well be worth it anyway, even if it doesn't perform as
well.

That said, you have a broader problem in that the more variability in the
data that's allowed, the greater the chance that you'll find the pattern
you're looking for, but not in the context you intend (i.e. a false
positive search result). You have that chance even with a regular search
pattern, but as the pattern gets shorter with more variation allowed, the
odds increase.

And I note that the above string of bytes is quite a bit different from,
and quite a bit simpler than, the search pattern you showed in earlier
posts. I would guess there's a much higher chance of seeing that pattern
in the wrong context than the other.

Frankly, the more you explain about the basic problem, the less I feel
that a simple search-and-replace is really the right way to go about
things. Files have structures; I can guarantee you whatever this kind of
file is, the intended user code doesn't need to search for things. It
simply parses the data and knows the precise location of particular kinds
of data within the file.

IMHO, your efforts would be better spent trying to reverse engineer the
file to the point where you can accomplish the same, rather than investing
effort on speeding up string searches on the data. Even better, just get
the documentation for the file format and code from that, rather than all
this investment in reverse-engineering.

I obviously don't have all the details with respect to the "why"s,
"what"s, etc. related to your problem. But it seems like you've taken a
time-consuming, difficult path that is practically guaranteed to be the
one that provides the least reliable results. I know that wouldn't be
_my_ first choice approaching a problem like this. :)

Pete

From: Tom Spink on 15 Sep 2009 15:18

Peter Duniho wrote:
> I know that wouldn't be
> _my_ first choice approaching a problem like this. :)

My first choice is the Microsoft "documentation" for the PE format. ;)

> Pete

--
Tom

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: 443413 M3i Zero , Ezflash Dsi , R4i Dsi 43531
Next: Can I get the Mime Content Type from a byte array?