小编DCB*_*DCB的帖子

从mbox文件中提取电子邮件的正文,将其解码为纯文本,而不管Charset和Content Transfer Encoding

我正在尝试使用Python 3从thunderbird mbox文件中提取电子邮件的正文.这是一个IMAP帐户.

我希望将电子邮件正文的文本部分作为unicode字符串进行处理.它应该"看起来像"Thunderbird中的电子邮件,并且不包含转义字符,例如\ r \n = 20等.

我认为内容传输编码我不知道如何解码或删除.我收到的电子邮件包含各种不同的内容类型和不同的内容传输编码.这是我目前的尝试:

import mailbox
import quopri,base64

def myconvert(encoded,ContentTransferEncoding):
    if ContentTransferEncoding == 'quoted-printable':
        result = quopri.decodestring(encoded)
    elif ContentTransferEncoding == 'base64':
        result = base64.b64decode(encoded)

mboxfile = 'C:/Users/Username/Documents/Thunderbird/Data/profile/ImapMail/server.name/INBOX'

for msg in mailbox.mbox(mboxfile):
    if msg.is_multipart():    #Walk through the parts of the email to find the text body.
        for part in msg.walk():
            if part.is_multipart(): # If part is multipart, walk through the subparts.
                for subpart in part.walk():
                    if subpart.get_content_type() == 'text/plain':
                        body = subpart.get_payload() # Get the subpart …

Run Code Online (Sandbox Code Playgroud)

email content-type plaintext mbox python-3.x

DCB*_*DCB

lucky-day

13
推荐指数

1
解决办法

1万
查看次数