Search through a (large) binary file. [CSharp]

Prev: 443413 M3i Zero , Ezflash Dsi , R4i Dsi 43531
Next: Can I get the Mime Content Type from a byte array?

From: Peter Duniho on 16 Sep 2009 13:31

On Wed, 16 Sep 2009 08:10:17 -0700, Michelle <michelle(a)notvalid.nomail>
wrote:

> Peter,
>
> [. . .]
> Int64 Offset1 = (ibBaseOffset + ibOffset); offset = 41152
> Int64 Offset2 = stream.Position; offset = 49152
>
> Why is Offset1 not equal to Offset2 ?

Because the old position is how far you've read into the file, not
necessarily where you're processing in the file.

> Offset1 is the right offset. Changing the block size has affect ( byte[]
> rgbBlockCur = new byte[4096]; )
>
> I tried several options with stream.Seek(Offset, SeekOrigin) to set the
> current position and restore the previously saved position.
> But when I read the previous 4 bytes and restore the position to the
> previous saved,
> the returned offset is not right anymore. The search continues, but it's
> not
> the right offset anymore.

It sounds to me as though you are somehow also resetting your processing
position. Without you showing how you modified my original code, I can't
offer any specific advice. But you _ought_ to be only modifying the
stream's Position; the ibBaseOffset and ibOffset variables should keep
their old values, and of course you also need to retain the byte[] buffers
used in the searching. Obviously, the easiest way to manage this is to
not return from the "FindByteString()" method at all when you find a
match; just do the necessary processing with the file then continue.

If for some reason that's not possible, and you don't want to literally
save and restore the necessary values and buffers, an alternative would be
to modify the "FindByteString()" method so that it takes an argument
specifying where to start searching. Then the caller, having done
whatever appropriate processing with the search result, can call the
method again, specifying the start offset as the previously returned
offset plus one. In the method, you'd simply set the stream Position to
this offset immediately after opening the stream.

Note that this latter approach will be less efficient, because the code
will be closing and reopening the stream, as well as re-reading bytes it'd
already read the previous time. I'd recommend simply not returning from
the method until you reach the end of the stream, rather than for each
time you find the string being searched for.

Pete

From: Michelle on 17 Sep 2009 01:41

Peter,

[. . . ]
> Without you showing how you modified my original code, I can't offer any
> specific advice.
[. . . ]

while (true)
{
if (ibOffset + rgbPattern.Length <= cbBlockCur)
{
if (FRangesEqual(rgbBlockCur, ibOffset,
rgbPattern.Length, rgbPattern))
{
Int64 currOffset = (ibBaseOffset + ibOffset);
//correct offset match
byte[] prevBytes = new Byte[4];

stream.Seek(currOffset-4, SeekOrigin.Begin);
//is 'stream.Seek(-4, SeekOrigin.Current);' possible ?)
stream.Read(prevBytes, 0, 4);
//or 'stream.Read(prevBytes, 0, prevBytes.Length);'
// do something using the byte array prevBytes
stream.Seek(currOffset, SeekOrigin.Begin);
// or maybe do nothing, because after reading 4 bytes is back again

Michelle

From: Peter Duniho on 17 Sep 2009 01:56

On Wed, 16 Sep 2009 22:41:53 -0700, Michelle <michelle(a)notvalid.nomail>
wrote:

> [...]
> Int64 currOffset = (ibBaseOffset + ibOffset);
> //correct offset match
> byte[] prevBytes = new Byte[4];
>
> stream.Seek(currOffset-4, SeekOrigin.Begin);
> //is 'stream.Seek(-4, SeekOrigin.Current);' possible ?)
> stream.Read(prevBytes, 0, 4);
> //or 'stream.Read(prevBytes, 0, prevBytes.Length);'
> // do something using the byte array
> prevBytes
> stream.Seek(currOffset, SeekOrigin.Begin);
> // or maybe do nothing, because after reading 4 bytes is back again

You need to restore the _actual_ stream Position, not the calculated
search offset. The latter is simply what bytes you're inspecting at the
moment, while the former is the next byte the code will need to read, if
and when it gets around to reading more data.

Your code should look more like this:

Int64 currOffset = stream.Position;
byte[] prevBytes = new Byte[4];

stream.Position = ibBaseOffset + ibOffset - 4;
stream.Read(prevBytes, 0, 4);
stream.Position = currOffset;

Pete

From: Michelle on 17 Sep 2009 02:30

Peter,

It works properly. I'm all the way now :-))
Indeed, it's the difference between how far I've read into the file and
where I'm processing in the file.
For me an example says more than just an description.

It's perhaps a stupid question, but 'stream.Read' needs to reed the bytes
from the file again (disk access)
Is it possible to read direct from the byte array, because that's in memory
?
It may not a big difference, but I'm curious.

Michelle

From: Peter Duniho on 17 Sep 2009 03:01

On Wed, 16 Sep 2009 23:30:10 -0700, Michelle <michelle(a)notvalid.nomail>
wrote:

> [...]
> It's perhaps a stupid question, but 'stream.Read' needs to reed the bytes
> from the file again (disk access)
> Is it possible to read direct from the byte array, because that's in
> memory
> ?
> It may not a big difference, but I'm curious.

Sure, it's possible. At the most simple, you can just check to make sure
that "ibOffset" is >= 4, and if it's not have some fallback code that goes
ahead and reads from the file as you're doing now (since the byte[]
buffers for the scanning won't contain the 4 bytes you need).

If you really want to avoid extra stream accesses under all circumstances,
you could keep a 4 byte (or 8 byte, since it appears from your previous
example you could need as many as 8 bytes) buffer that's updated each time
you update the byte[] buffers. That is, rather than discarding the
previous buffer entirely, save the last 4 (or 8) bytes in the buffer to a
new "scan-back" buffer you add to the code.

That said, Windows is buffering your file data for you already. There's
some extra overhead repositioning the stream, reading, and resetting the
position, but it's probably not nearly as much as one might think.

Pete

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: 443413 M3i Zero , Ezflash Dsi , R4i Dsi 43531
Next: Can I get the Mime Content Type from a byte array?