From: Martin on
Currently I am trying to get used to Python's imaplib and email
modules.
I'like to create a webmail client simmilar to GMail.

My Questions:
a) Is there any feature hidden in Python's built-in modules (imaplib,
email) that already can group all my mails into threads?

b) If not a... what would be the best way to implement this?

I can think of two approaches:
b.1) Use the "References:" field of the messages in order to find out
which messages are related to each other.

I tried a first implementation which works quite well but I don't know
if there can occur situations where one message is related to two
parents. Also I don't know what happens if someone is too lazy to type
my address. He might click at "Reply", delete topic and old mail-text
and compose a new mail. Theoretically his mail client would set the
"References:" field accordingly never the less, wouldn't it? Therefore
my mail client would consider that completely new mail as part of an
older conversation.

The thoughts above might lead to the second approach:

b.2) Use the "Subject:" field of the messages.

I also tried this implementation and it also works (at first glance).
I stripped all subjects of all mails so that all those "Re:", "Fw:"
tags at the beginning get deleted. Afterwards I grouped those having
the same subject and the same participants. Problem: I have no clue
what "Re:"-tags might exist around the world. I guess each mail client
and each language uses different ones, right?

c) Does anyone know good resources to gain more knowledge about imap /
mailing?
Currently I am using those sites as a reference:

http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/
(as a start *g)
http://tools.ietf.org/html/rfc3501
http://tools.ietf.org/html/rfc2822
http://docs.python.org/library/imaplib.html
http://docs.python.org/library/email.html

Maybe there are other sources of interest on the web? :)

This is my first post in this newsgroup. So:

"Hello everybody!" :-)

I've been reading this group for quite a while and I am really
astonished how fast people give valuable answers here. This is a
really great community! Many thanks in advance for all ideas!

Greetz,
Martin
From: Jean-Paul Calderone on
On Fri, 19 Dec 2008 08:47:18 -0800 (PST), Martin <mrtot(a)gmx.de> wrote:
>Currently I am trying to get used to Python's imaplib and email
>modules.
>I'like to create a webmail client simmilar to GMail.

I'd suggest using Twisted's IMAP4 client. It's somewhat easier to
use than Python's imaplib because it does much more parsing of IMAP4's
complex syntax for you. It will also be easier to do IMAP4 and HTTP
simultaneously if you're using Twisted.

Twisted won't help with the threading feature you want to implement
though, it lets you use the email package if you want to examine the
structure of the messages you retrieve.

Jean-Paul
From: Michael Torrie on
Martin wrote:
> Currently I am trying to get used to Python's imaplib and email
> modules.
> I'like to create a webmail client simmilar to GMail.

This is off-topic, but why on earth would you want to emulate Gmail's
conversation views? It's horrible and a very broken way of viewing
e-mail threads. Compared the normal, threaded view of, say the
discussions on this list to the view that Gmail gives you. For
conversations of more than half a dozen posts, Gmail's view is
unnavigatable. Suppose I want to break into a discussion that's already
dozens of posts long. With a real threaded view I can easily see the
flow of the conversation, grab random posts, then maybe read their
parent or grandparent posts. Looking at the rest of your e-mail, I can
see that maybe you do want to have real threads rather than the google
conversation view which removes all structure.

>
> My Questions:
> a) Is there any feature hidden in Python's built-in modules (imaplib,
> email) that already can group all my mails into threads?

Each e-mail has a referral number that refers to the parent email. Just
keep track of these in a structure and you can easily build a nice tree
of the thread.

>
> b) If not a... what would be the best way to implement this?
>
> I can think of two approaches:
> b.1) Use the "References:" field of the messages in order to find out
> which messages are related to each other.

Yes. This is absolutely the right way to do it.

>
> I tried a first implementation which works quite well but I don't know
> if there can occur situations where one message is related to two
> parents. Also I don't know what happens if someone is too lazy to type
> my address. He might click at "Reply", delete topic and old mail-text
> and compose a new mail. Theoretically his mail client would set the
> "References:" field accordingly never the less, wouldn't it? Therefore
> my mail client would consider that completely new mail as part of an
> older conversation.

In this case, a lazy user is a lazy user. Probably best to encourage
people to use better etiquette when using e-mail.

>
> The thoughts above might lead to the second approach:
>
> b.2) Use the "Subject:" field of the messages.

Horribly broken. Thunderbird does this and it drives me crazy. I often
get messages months apart that happen to have a common subject line,
even though they aren't the same thread or conversation. I don't want a
new message, which does not refer to the old message in any way, to
attach itself to my 6-month old message and force me to scroll down
through potentially hundreds of e-mails to find the stupid thing. No,
the RFCs are there for a reason. They bring sanity to the chaos.
Anything else is madness. And the fact the Outlook doesn't do proper
referral fields just infuriates me. Sigh.



From: Chris Rebert on
On Fri, Dec 19, 2008 at 11:54 AM, Michael Torrie <torriem(a)gmail.com> wrote:
> Martin wrote:
>> Currently I am trying to get used to Python's imaplib and email
>> modules.
>> I'like to create a webmail client simmilar to GMail.
>
> This is off-topic, but why on earth would you want to emulate Gmail's
> conversation views? It's horrible and a very broken way of viewing
> e-mail threads. Compared the normal, threaded view of, say the
> discussions on this list to the view that Gmail gives you. For
> conversations of more than half a dozen posts, Gmail's view is
> unnavigatable. Suppose I want to break into a discussion that's already
> dozens of posts long. With a real threaded view I can easily see the
> flow of the conversation, grab random posts, then maybe read their
> parent or grandparent posts. Looking at the rest of your e-mail, I can
> see that maybe you do want to have real threads rather than the google
> conversation view which removes all structure.

I disagree. Reading the messages in chronological order is natural and
if people quote their parent posts properly, which they nearly always
do, there's no need to consult the parent message again (and you'll
have already read it by that point in the conversation anyway and
recognize it). Why would you "grab random posts" anyway? It makes much
more sense to just read the stream until you reach an interesting post
(thus gaining the context of the _entire_ discussion) or just read the
post in isolation along with its quoting of its parents.
Additionally, for most normal people who've never heard of
mailinglists, email conversations are typically simple back-and-forth
exchanges displayed excellently by Gmail's conversation view; these
same people would probably find threading complex and confusing.

<snip>
>> The thoughts above might lead to the second approach:
>>
>> b.2) Use the "Subject:" field of the messages.
>
> Horribly broken. Thunderbird does this and it drives me crazy. I often
> get messages months apart that happen to have a common subject line,
> even though they aren't the same thread or conversation. I don't want a
> new message, which does not refer to the old message in any way, to
> attach itself to my 6-month old message and force me to scroll down
> through potentially hundreds of e-mails to find the stupid thing. No,
> the RFCs are there for a reason. They bring sanity to the chaos.
> Anything else is madness. And the fact the Outlook doesn't do proper
> referral fields just infuriates me. Sigh.

Yes, apparently circa Netscape 3.0 they used an ingenious message
threading algorithm (described on
http://www.jwz.org/doc/threading.html) but the Netscape 4 devs
foolishly threw out the code and wrote the broken algorithm used
today. Quite a shame.

Cheers,
Chris

--
Follow the path of the Iguana...
http://rebertia.com
From: Grant Edwards on
On 2008-12-19, Jean-Paul Calderone <exarkun(a)divmod.com> wrote:
> On Fri, 19 Dec 2008 08:47:18 -0800 (PST), Martin <mrtot(a)gmx.de> wrote:
>>Currently I am trying to get used to Python's imaplib and email
>>modules.
>>I'like to create a webmail client simmilar to GMail.
>
> I'd suggest using Twisted's IMAP4 client. It's somewhat easier to
> use than Python's imaplib because it does much more parsing of IMAP4's
> complex syntax for you. It will also be easier to do IMAP4 and HTTP
> simultaneously if you're using Twisted.

Anything that helps with with the IMAP responses is worth
looking at, because parsing IMAP response messages is brutal.

I'm not sure what the IMAP protocol authors thought was going
to be parsing the replies, but it sure couldn't have been a
real-world computer program. IMO, the IMAP protocol has
achieved a level of suckage that would make Microsoft proud.

But, it works. Eventually.

--
Grant