From: Chris Nehren on
On 2010-06-16, Tuxedo scribbled these curious markings:
> I thought the mbox format was meant to begin with "From" on the first line
> of the file. At least that's how mboxes look on my Linux box. But who knows
> what could have been inserted by some Windows application.

Silly mortal, assuming software adheres to standards. Have you watched
that video yet? :)

--
Thanks and best regards,
Chris Nehren
Unless noted, all content I post is CC-BY-SA.
From: John Kelly on
On Wed, 16 Jun 2010 02:07:09 +0200, Janis Papanagnou
<janis_papanagnou(a)hotmail.com> wrote:

>To prevent a message body line starting with "From [...]" you can defined
>the pattern more accurate, instead of /^From / specify (for example)...
>
> /^From - [A-Z][a-z][a-z] [A-Z][a-z][a-z] .* [0-9][0-9][0-9][0-9]$/ {...}
>
>or perhaps just
>
> NF==7 && /^From / {...}

I wonder how mail programs cope with that. The extra test is good, but
not foolproof. No test can be foolproof, unless "^From " in the body is
escaped (mangled) when stored.



--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php

From: John Kelly on
Tuxedo <tuxedo(a)mailinator.com> wrote:

>Yes I think the file must be in some compressed format.

Low level tools like dd help focus on the real problem.


>I customer just gave me a massive mail file in mbox format which has
>accrued over several years. The file was rescued from an old drive of a
>previous but now broken system, and so I would like to restore the mailbox
>in a mail application on a new system.
>
>The mail file was readable on the previous system in Mozilla Thunderbird,
>as there it had a corresponding .msf index. However, the .msf file no
>longer exists and the mbox itself is nearly 3GB.

What? You said you rescued an old drive. So if the .msf file no longer
exists, how can you know it had a .msf file?

I'm beginning to wonder if this thread is a practical joke.


--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php

From: Tuxedo on
Chris Nehren wrote:

[...]

> Silly mortal, assuming software adheres to standards. Have you watched
> that video yet? :)

I watched about half of "Email hatest the Living" on Google, entertaining
stuff! I will watch the rest.

In case of Thunderbird mbox format, the mbox files normally begin with
'From', at least so it does in other working T-Bird mail files from the
same system where the 2.8GB mail file comes from. It appears that T-Bird is
using some compression format when an mbox file hits a certain size:
https://wiki.mozilla.org/Talk:Thunderbird:2.0_Product_Planning#Auto_compress_folders_after_relative_changes_in_size

If I only knew which, I could try and uncompress it.

Tuxedo


From: Maxwell Lol on
John Kelly <jak(a)isp2dial.com> writes:

> On Mon, 14 Jun 2010 21:17:26 -0400, Maxwell Lol <nospam(a)com.invalid>
> wrote:
>>You can even use perl and use something like
>>
>> @mail = split(/\nFrom /,$mboxfile);
>
> That will read it into memory all at once, which may cause thrashing
> with his 3GB file. In his scenario, better to read and write one line
> at a time, and open a new output file every so many messages.

Sure. I just wanted to mention this technique, because it's useful at times.

> It's easy to shoot yourself in the foot with Perl.

Of course. Dealing with 3GB files can be a concern.

However, if you only have to do it once, sometimes it's better to let
the computer do the work, even if it's not the most elegant solution.

There are times when I know it will take (say) 30 seconds longer for a
command to complete, but it's easier to do that, than to write a
better script (which will take longer than 20 seconds).

Mental triage, so to speak.