detect endianness of a binary with python [Python]

Prev: Sorting a list created from a parsed xml message
Next: An ODBC interface for Python 3?

From: MRAB on 21 Jul 2010 15:54

Thomas Jollans wrote:
> On 07/21/2010 05:29 PM, Holger brunck wrote:
>>>> Something like the "file" utility for linux would be very helpfull.
>>>>
>>>> Any help is appreciated.
>>> You're going to have to describe in detail what's in the file before
>>> anybody can help.
>> We are creating inside our buildsystem for an embedded system a cram filesystem
>> image. Later on inside our build process we have to check the endianness,
>> because it could be Little Endian or big endian (arm or ppc).
>>
>> The output of the "file" tool is for a little endian cramfs image:
>> <ourImage>: Linux Compressed ROM File System data, little endian size 1875968
>> version #2 sorted_dirs CRC 0x8721dfc0, edition 0, 462 blocks, 10 files
>>
>> It would be possible to execute
>> ret = os.system("file <ourImage> | grep "little endian")
>> and evaluate the return code.
>> But I don't like to evaluate a piped system command. If there is an way without
>> using the os.system command this would be great.
>
> Files don't, as such, have a detectable endianess. 0x23 0x41 could mean
> either 0x4123 or 0x2341 - there's no way of knowing.
>
> The "file" utility also doensn't really know about endianess (well,
> maybe it does swap bytes here and there, but that's an implementation
> detail) - it just knows about file types. It knows what a little-endian
> cramfs image looks like, and what a big-endian cramfs image looks like.
> And as they're different, it can tell them apart.
>
> If you're only interested in a couple of file types, it shouldn't be too
> difficult to read the first few bytes/words with the struct module and
> apply your own heuristics. Open the files in question in a hex editor
> and try to figure out how to tell them apart!

If you have control over the file format then you could ensure that
there's a double-byte value such as 0xFF00 at a certain offset. That
will tell you the endianness of the file.

From: Grant Edwards on 21 Jul 2010 22:31

On 2010-07-21, Thomas Jollans <thomas(a)jollans.com> wrote:

>> It would be possible to execute ret = os.system("file <ourImage> |
>> grep "little endian") and evaluate the return code. But I don't like
>> to evaluate a piped system command. If there is an way without using
>> the os.system command this would be great.
>
> Files don't, as such, have a detectable endianess. 0x23 0x41 could mean
> either 0x4123 or 0x2341 - there's no way of knowing.
>
> The "file" utility also doensn't really know about endianess (well,
> maybe it does swap bytes here and there, but that's an implementation
> detail) - it just knows about file types. It knows what a little-endian
> cramfs image looks like, and what a big-endian cramfs image looks like.
> And as they're different, it can tell them apart.
>
> If you're only interested in a couple of file types, it shouldn't be too
> difficult to read the first few bytes/words with the struct module and
> apply your own heuristics. Open the files in question in a hex editor
> and try to figure out how to tell them apart!

And by looking at the rules that "file" uses for the two file types
that matter, one should be able to figure out how to implement
something in Python. Or one can use the Python "magic" module as
previously suggested: http://pypi.python.org/pypi/python-magic/

--
Grant

From: Daniel Fetchinson on 22 Jul 2010 05:38

>>> Something like the "file" utility for linux would be very helpfull.
>>>
>>> Any help is appreciated.
>
>>You're going to have to describe in detail what's in the file before
>>anybody can help.
>
> We are creating inside our buildsystem for an embedded system a cram
> filesystem
> image. Later on inside our build process we have to check the endianness,
> because it could be Little Endian or big endian (arm or ppc).
>
> The output of the "file" tool is for a little endian cramfs image:
> <ourImage>: Linux Compressed ROM File System data, little endian size
> 1875968
> version #2 sorted_dirs CRC 0x8721dfc0, edition 0, 462 blocks, 10 files
>
> It would be possible to execute
> ret = os.system("file <ourImage> | grep "little endian")
> and evaluate the return code.
> But I don't like to evaluate a piped system command. If there is an way
> without
> using the os.system command this would be great.
>

Please see http://pypi.python.org/pypi/python-magic

HTH,
Daniel

--
Psss, psss, put it down! - http://www.cafepress.com/putitdown

From: Tim Roberts on 23 Jul 2010 01:44

Holger brunck <holger.brunck(a)keymile.com> wrote:
>
>We are creating inside our buildsystem for an embedded system a cram filesystem
>image. Later on inside our build process we have to check the endianness,
>because it could be Little Endian or big endian (arm or ppc).
>
>The output of the "file" tool is for a little endian cramfs image:
><ourImage>: Linux Compressed ROM File System data, little endian size 1875968
>version #2 sorted_dirs CRC 0x8721dfc0, edition 0, 462 blocks, 10 files
>
>It would be possible to execute
>ret = os.system("file <ourImage> | grep "little endian")
>and evaluate the return code.

I wouldn't use os.system with grep and evaluate the return code. Instead
I'd use subprocess.Popen("file <ourImage>") and read the text output of the
commdn directly. By parsing that string, I can extract all kinds of
interesting information.

That is an entirely Unix-like way of doing things. Don't reinvent the
wheel when there's a tool that already does what you want.
--
Tim Roberts, timr(a)probo.com
Providenza & Boekelheide, Inc.

From: Robert Kern on 23 Jul 2010 11:44

On 7/23/10 12:44 AM, Tim Roberts wrote:
> I wouldn't use os.system with grep and evaluate the return code. Instead
> I'd use subprocess.Popen("file<ourImage>") and read the text output of the
> commdn directly. By parsing that string, I can extract all kinds of
> interesting information.

Small correction: subprocess.Popen(["file", our_image_filename])

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

First | Prev |
Pages: 1 2
Prev: Sorting a list created from a parsed xml message
Next: An ODBC interface for Python 3?