From: Eli the Bearded on
PDFs, SWFs, ROM images, non-standard container formats. I'd like some
tool that can scan those for well-formed (continuous) bits of media.

The strings(1) command can do that for finding text although it's not
as intelligent as I'd like it to be. (Why do wide characters only get
found with a special command line argument?)

Are there tools to do this for JPEGs, GIFs, MP3s, AVIs? I know there
are tools to find JPEGs in disk images (eg, for recovery of photos from
camera media) but I suspect those rely on what's left of the filesystem
information, so as to be able to reconstruct discontinuous files.

When I had a Mac (System 7 was current at the time), I used to have a
tool called "CanOpener" that could pull PICTs and audio files out of
both the data and resource forks of files. That's the type of program
I want. (CanOpener apparently is still around with an OSX version.)

Elijah
------
had a dual boot A/UX / System 7 box
From: Bill Marcum on
On 2009-11-20, Eli the Bearded <*@eli.users.panix.com> wrote:
> PDFs, SWFs, ROM images, non-standard container formats. I'd like some
> tool that can scan those for well-formed (continuous) bits of media.
>
> The strings(1) command can do that for finding text although it's not
> as intelligent as I'd like it to be. (Why do wide characters only get
> found with a special command line argument?)
>
> Are there tools to do this for JPEGs, GIFs, MP3s, AVIs? I know there
> are tools to find JPEGs in disk images (eg, for recovery of photos from
> camera media) but I suspect those rely on what's left of the filesystem
> information, so as to be able to reconstruct discontinuous files.
>
I know that some image file formats have embedded strings like GIF89 or JFIF.
You can look in /etc/magic or /usr/share/file/magic for others.

From: Eli the Bearded on
In comp.os.linux.misc, Bill Marcum <marcumbill(a)bellsouth.net> wrote:
> I know that some image file formats have embedded strings like GIF89
> or JFIF. You can look in /etc/magic or /usr/share/file/magic for
> others.

I'm well aware of that. I have an editor macro to find base64 encoded
versions which I use to purge attachments from some email files.
That is not all that helpful. I can use the JPEG magic number to go
extract images by hand from some file formats (I've done it in the
past for ROM images which is why I mentioned that.) I cannot by
hand tell how many bytes long the image is for all file formats.
Something that parses the JPEG structure could do that. I can use
something like this for the specific case of JPEGs:

$ jpegtran -copy all foo.jpeg+extra > foo.jpeg

That is useful for removing the trailing extra bytes from a JPEG
without also losing comments, Exif, etc, or doing a lossy
recompression that you might get by opening and resaving the file.
It doesn't cover the case of GIF or PNG or AVI or any thing else.

Elijah
------
/^\/9j\/|^R0lGOD|^0M8R4K|iVBORw|JVBERi|UEsDBB/
From: Mark Hobley on
Eli the Bearded <*@eli.users.panix.com> wrote:
> PDFs, SWFs, ROM images, non-standard container formats. I'd like some
> tool that can scan those for well-formed (continuous) bits of media.

Sounds and images tend to not to be embedded on Linux based systems, so
are stored as separate files. These are easily identifiable using the 'file'
command.

If there is stuff embedded in a container, hopefully there are tools to
unpack it, (as there are tools to pack it), but I haven't looked at
this.

Mark.

--
Mark Hobley
Linux User: #370818 http://markhobley.yi.org/

From: Mumia W. on
On 11/20/2009 05:23 PM, Eli the Bearded wrote:
> PDFs, SWFs, ROM images, non-standard container formats. I'd like some
> tool that can scan those for well-formed (continuous) bits of media.
> [...]

Perhaps "foremost" or other forensic tools will help you.