From: janwillem on
I have a script that extracts attachments from all emails in a mbox
(largely based on http://code.activestate.com/recipes/302086-strip-attachments-from-an-email-message/;
thanks ActiveState). It works fine until it encounters an attachment
with a unicode file name (Ukrainian in my case). I cannot get working
the line
msg.set_payload(replace)
which is line 39 in the activestate snippet.

How can you get the unicode file name into the replace string of line
35 of the snippet:
replace = ReplaceString % dict(content_type=ct,
filename=fn,
params=params)
without getting this nasty error message about ascii encoding?
From: Steven D'Aprano on
On Sun, 18 Apr 2010 03:02:23 -0700, janwillem wrote:

> How can you get the unicode file name into the replace string of line 35
> of the snippet:
> replace = ReplaceString % dict(content_type=ct,
> filename=fn,
> params=params)
> without getting this nasty error message about ascii encoding?

Completely untested...

fn = fn.encode('utf-8')
replace = ReplaceString % dict(
content_type=ct, filename=fn, params=params)



--
Steven
From: janwillem on
On Apr 18, 12:09 pm, Steven D'Aprano <st...(a)REMOVE-THIS-
cybersource.com.au> wrote:
> On Sun, 18 Apr 2010 03:02:23 -0700, janwillem wrote:
> > How can you get the unicode file name into the replace string of line 35
> > of the snippet:
> > replace = ReplaceString % dict(content_type=ct,
> >                                        filename=fn,
> >                                        params=params)
> > without getting this nasty error message about ascii encoding?
>
> Completely untested...
>
> fn = fn.encode('utf-8')
> replace = ReplaceString % dict(
>           content_type=ct, filename=fn, params=params)
>
> --
> Steven

Yes this eliminates the encoding error but my email client does not
interprete the result correctly
01_-_ван_Дайк_-_Технічні_рекомендації_ЄС.ppt
becomes
01_-_ван_Дайк_-_Технічні_рекомендації
_ЄС.ppt

In the original email it says:
name="01 - =?UTF-8?B?
0LLQsNC9INCU0LDQudC6IC0g0KLQtdGF0L3RltGH0L3RliDRgNC10LrQvtC8?==?UTF-8?
B?0LXQvdC00LDRhtGW0Zcg0ITQoS5wcHQ=?="