From: Loki Harfagr on
Tue, 15 Jun 2010 09:57:01 +0200, Tuxedo did cat :

> Chris F.A. Johnson wrote:
>
> [...]
>
>> Use formail:
>>
>> formail -s savemail < "$mbox"
>>
>> Where savemail is a script containing:
>>
>> cat > $(date +%Y-%m-%d_%H:%M:%S)-$(uuidgen)
>>
>> This will put each message in a separate file. Adjust to taste if
>> you want to put more than one message into each file or to use
>> different filenames.
>
> Thanks for this proceure, it works fine on a not-too-large mbox.
> However, it fails with the huge file that that the system runs out of
> memory, as I guess cat or formail tries to read in the full file to
> process. But it's a good example how to split an mbox into individual
> files. I will probably use this idea for something else.
>
> Many thanks,
> Tuxedo.

maybe try this variant, just hoping it woulb be less greedy and won't eat all the process:
$ export FILENO=000000 ; formail -n32 +1ds procmail -p 'DEFAULT=/tmp/_mb_$FILENO' /dev/null<yourMbox


From: Chris Nehren on
On 2010-06-15, Tuxedo scribbled these curious markings:
> Chris Nehren wrote:
>
>> Use a module, like Mail::Box or
>> Email::Folder::Mbox, something that's been tested and in production use
>> at large ESPs for decades.
>
> How can I use these Perl modules to split the mbox? Will they not also
> attempt to read the entire file in one go and run out of memory...

Borrowing from the Email::Folder docs:

#!/usr/bin/perl
use strict;
use warnings;

use Email::Folder;

my $folder = Email::Folder->new("some_file");
while(my $message = $folder->next_message) {
print $message->header('Subject'), "\n";
}

Or thereabouts. No, it will not read the entire file all at once, unless
you call ->messages on the Email::Folder object. For more information on
what you can do with the $message object, see Email::Simple's docs.

Mail::Box not covered here because, while it is the swiss-army chainsaw
of mail modules, it's also more complex with a higher learning curve.

--
Thanks and best regards,
Chris Nehren
From: Ben Bacarisse on
Tuxedo <tuxedo(a)mailinator.com> writes:
<snip>
> Thanks for any further tips.

Another plan might be to use the "reformail" tool. I've used it in
similar situations though nothing on quite the same scale. In
particular the -s option runs a program for each mail in the mbox file;
the message is provided on stdin and an environment variable provides
access to a counter so you can simply number the messages.

It is often part of the "maildrop" package though I think it was
originally part of the courier mail system.

--
Ben.
From: Ben Bacarisse on
Ben Bacarisse <ben.usenet(a)bsb.me.uk> writes:

> Tuxedo <tuxedo(a)mailinator.com> writes:
> <snip>
>> Thanks for any further tips.
>
> Another plan might be to use the "reformail" tool.

I see that formail has been suggested already. I am not sure of
reformail is another implementation (in which it case it may be worth
trying) or just a renaming of formail (in which case it might also have
trouble with the mbox size). Maybe someone who knows both can comment.

<snip>
--
Ben.
From: John Kelly on
On Tue, 15 Jun 2010 09:39:16 +0200, Tuxedo <tuxedo(a)mailinator.com>
wrote:

>John Kelly wrote:

>> IOW, it's not hard identify message boundaries. You can use common text
>> processing tools to split the big file into smaller ones.
>
>Thanks for the tip but I'm not sure what processing tools can be used to
>split the file into smaller ones? At least no editor that I know will open
>the file. It's simply too big.

I was not talking about text editors, where you read the whole file into
memory all at once. Tools like grep, sed, and awk read one line at at
time. Or you could write a simple while loop in bash to read a file one
line at a time.

while read; do

# each line is in $REPLY
# do something with it

done < mybigfile

If you don't have enough knowledge of these tools to devise a solution,
Chris idea of Email::Folder may work for you.


--
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php