From: John Kelly on
On Tue, 15 Jun 2010 22:34:43 +0200, Tuxedo <tuxedo(a)mailinator.com>
wrote:

>I even tested placing the resulting file 100 bytes file in a Mozilla mail
>directory. The mailfolder (or file) shows up but is empty, not even a start
>of a single message. I also tested with version longer than the 100 bytes.
>
>I guess I have been doomed with a corrupt mbox file! But how can such large
>2.8GB file contain nothing readable? It should be a direct copy of the mbox
>and a full version of the file, not a truncated 2GB limit file via ftp or
>other file transfer. I copied the file from the original Windows drive via
>USB Flash media directly onto a Linux system where I ran the dd command.
>
>Thanks for any advise or theories on how this possibly corrupt mbox may be
>reinvigorated and viewed.

100 bytes is not enough to see the big picture. Try more, 1,000 or
10,000, or whatever it takes until you see some data that looks like
mail messages. Then use the skip feature of dd to read past that when
copying.






--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php

From: John Kelly on
On Tue, 15 Jun 2010 22:34:43 +0200, Tuxedo <tuxedo(a)mailinator.com>
wrote:

>I guess I have been doomed with a corrupt mbox file! But how can such large
>2.8GB file contain nothing readable? It should be a direct copy of the mbox
>and a full version of the file, not a truncated 2GB limit file via ftp or
>other file transfer. I copied the file from the original Windows drive via
>USB Flash media directly onto a Linux system where I ran the dd command.

Are you sure the original Windows file is mbox format? Even if it is,
there are opportunites for extra garbage to be added when copying from
one system to another.

If you can find mbox messages somewhere in the file, you can use dd to
strip off the leading garbage.

But maybe it's not really mbox format, and there is extra garbage
between each message. Or worse, some kind of compressed format where
you can't really see what you have just by looking at the data.

Tinkering with the data, using dd, can help you answer those questions.



--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php

From: Janis Papanagnou on
Tuxedo wrote:
> Hi,
>
> I customer just gave me a massive mail file in mbox format which has
> accrued over several years. The file was rescued from an old drive of a
> previous but now broken system, and so I would like to restore the mailbox
> in a mail application on a new system.
>
> The mail file was readable on the previous system in Mozilla Thunderbird,
> as there it had a corresponding .msf index. However, the .msf file no
> longer exists and the mbox itself is nearly 3GB. When placing this in a new
> T-Bird mail folder, the mail application tries but soon fails to generate
> the index which is necessary to display the messages.
>
> At first I thought the file may be corrupt so I tried running:
> formail -zds < big_mbox >> fixed_mbox
>
> But soon after formail began munching its way into the big_mbox there was
> an "Out of memory" error returned by the shell, which I guess was also what
> the mail client silently did.
>
> I guess I need more ram to process such big file and that any mail
> application, formail included, simply needs more than the filesize, which
> unfortunately I do not have. In any case, I think the file is probably Ok
> since it worked fine on the previous system.
>
> What methods exists to process and restore this huge file? How about for
> example splitting it into parts, such as 5 or 10 different files, obviously
> cut at the right points between messages. I guess the individual mbox files
> can then easily be readable in more or less any mail application. Can this
> be done via the shell and if so how?
>
> Are there any particular Unix tools to split such huge message files or
> create an .msf index without running out of memory in the process?
>
> Many thanks for any ideas and advise.

I haven't read the whole bandworm thread, so that may already have been
suggested; say you want the mails sorted by month and year, as defined
in the From field (e.g. "From - Sun Dec 27 21:08:44 2009", and all mails
from Dec 2009 in file mbox shall be stored in a file mbox_2009-Dec)...

awk '/^From / { f = "mbox_"$NF"-"$4 } { print > f }' mbox

(If the number of created files will exceed some number of allowed open
file descriptors, please tell us, then the code needs some adjustments.)

Janis

>
> Tuxedo
From: Janis Papanagnou on
[ Sorry for the followup to my own post.]

Janis Papanagnou wrote:
>
> I haven't read the whole bandworm thread, so that may already have been
> suggested; say you want the mails sorted by month and year, as defined
> in the From field (e.g. "From - Sun Dec 27 21:08:44 2009", and all mails
> from Dec 2009 in file mbox shall be stored in a file mbox_2009-Dec)...
>
> awk '/^From / { f = "mbox_"$NF"-"$4 } { print > f }' mbox

To prevent a message body line starting with "From [...]" you can defined
the pattern more accurate, instead of /^From / specify (for example)...

/^From - [A-Z][a-z][a-z] [A-Z][a-z][a-z] .* [0-9][0-9][0-9][0-9]$/ {...}

or perhaps just

NF==7 && /^From / {...}

>
> (If the number of created files will exceed some number of allowed open
> file descriptors, please tell us, then the code needs some adjustments.)
>
> Janis

From: Tuxedo on
Janis Papanagnou wrote:

[...]

> > awk '/^From / { f = "mbox_"$NF"-"$4 } { print > f }' mbox
>
> To prevent a message body line starting with "From [...]" you can defined
> the pattern more accurate, instead of /^From / specify (for example)...
>
> /^From - [A-Z][a-z][a-z] [A-Z][a-z][a-z] .* [0-9][0-9][0-9][0-9]$/ {...}
>
> or perhaps just
>
> NF==7 && /^From / {...}
>
> >
> > (If the number of created files will exceed some number of allowed open
> > file descriptors, please tell us, then the code needs some adjustments.)
> >
> > Janis
>

Thanks for this awk tip!

But you are right, the first one catches message body text that simply
begin a line with "From":
awk '/^From / { f = "mbox_"$NF"-"$4 } { print > f }' mbox

The other versions, however, I get some errors with. I presume I am
replicating it in some wrong way:

awk '/^From - [A-Z][a-z][a-z] [A-Z][a-z][a-z] .* [0-9][0-9][0-9][0-9]$/ { f
= "mbox_"$NF"-"$4 } { print > f }' mbox

The error for the above is "redirection has null string value".

awk 'NF==7 && /^From { f = "mbox_"$NF"-"$4 } { print > f }' sent-mail

The error here is "unterminated regexp".

Perhaps you can correct the above or type your two last examples in full?

Thanks,
Tuxedo