Man*_*ron 13 python email parsing mime
我正在编写一个Python脚本来处理从Procmail返回的电子邮件.正如这个问题所示,我正在使用以下Procmail配置:
:0:
|$HOME/process_mail.py
Run Code Online (Sandbox Code Playgroud)
我的process_mail.py脚本通过stdin接收电子邮件,如下所示:
From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
Run Code Online (Sandbox Code Playgroud)
我试图以这种方式解析消息:
>>> import email
>>> msg = email.message_from_string(full_message)
Run Code Online (Sandbox Code Playgroud)
我想获取"From","To"和"Subject"等消息字段.但是,消息对象不包含任何这些字段.
我究竟做错了什么?
Ale*_*lli 10
您必须确保线条不会被意外损坏(因为它们在上面,虽然很难说这是否是复制粘贴问题) - 使用完整的消息,例如:
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44) by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3 for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
Run Code Online (Sandbox Code Playgroud)
然后
msg = email.message_from_string(msgtxt)
print msg['Subject']
Run Code Online (Sandbox Code Playgroud)
TEST 12根据需要打印.
看起来您的换行符没有在附加行前面添加空格,根据RFC 2822 \xc2\xa72.3.2 ,这是非法的:
\n\n\n\n
每个标头字段在逻辑上都是由字段名称、冒号和字段主体组成的单行字符。然而,为了方便
起见,并处理每行 998/78 个字符的限制,
标头字段的字段正文部分可以拆分为多个
行表示;这称为“折叠”。一般规则是,
只要该标准允许折叠空白(不只是
WSP 字符),就可以在任何 WSP 之前插入 CRLF。对于
\n示例,标头字段:Run Code Online (Sandbox Code Playgroud)\nSubject: This is a test\n可以表示为:
\nRun Code Online (Sandbox Code Playgroud)\nSubject: This\n is a test\n
它应该看起来像这样:
\nFrom hostname Tue Jun 15 21:43:30 2010\nReceived: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400\nReceived: from mail-fx0-f44.google.com (209.85.161.44)\n by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400\nReceived: by fxm19 with SMTP id 19so170709fxm.3\n for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)\nMIME-Version: 1.0\nReceived: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15\n Jun 2010 18:47:33 -0700 (PDT)\nReceived: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)\nDate: Tue, 15 Jun 2010 20:47:33 -0500\nMessage-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>\nSubject: TEST 12\nFrom: Full Name <username@sender.com>\nTo: username@domain.com\nContent-Type: text/plain; charset=ISO-8859-1\n\nONE\nTWO\nTHREE\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
17731 次 |
| 最近记录: |