将XML文件作为附件通过电子邮件发送时,会对Content-Transfer-Encoding感到困惑

Question

将XML文件作为附件通过电子邮件发送时,会对Content-Transfer-Encoding感到困惑

Ler*_*roy 6 xml email encoding mime utf-8

我有一个UTF-8编码的XML文件,通过电子邮件发送附件.当电子邮件收件人打开电子邮件并保存附件时,XML文件不再是UTF-8(而是报告ANSI编码).在这种情况下,收件人使用Microsoft Outlook,如果它很重要.

我在一个我不能依赖合适的MIME库的可用性的环境中编程,所以我需要了解我的错误.

在通过电子邮件发送XML文件之前,在服务器上创建它之后,我可以看到使用Linux文件命令它是一个UTF-8文件.独立于这一点,XML也有一个版本头<?xml version="1.0" encoding="UTF-8"?>(这不是我的问题真的很重要,但我包括它的完整性).我很确定我的代码通过电子邮件发送文件是问题所在,但我不确定是否采用"正确"方式.

我发送的标题是:

"Mime-Version" "1.0"
"Content-Type" "multipart/mixed; boundary="__==NAHDHDH2.28ABSDJxjhkjhsdkjhd___"\n\n"

Run Code Online (Sandbox Code Playgroud)

电子邮件的正文是:

--__==NAHDHDH2.28ABSDJxjhkjhsdkjhd___\n
Content-Type: text/plain; charset="utf-8"; format=flowed\n
Content-Transfer-Encoding: 7bit\n\n
Please find attached the data file generated 
--__==NAHDHDH2.28ABSDJxjhkjhsdkjhd___\n
Content-Type: text/plain; charset="utf-8"\n
Content-Disposition: attachment; filename="My_File_Name"\n\n
XML FILE CONTENTS GO HERE
--__==NAHDHDH2.28ABSDJxjhkjhsdkjhd___--\n

Run Code Online (Sandbox Code Playgroud)

问题:

我应该使用quoted-printable,8bit还是其他类型的 Content-Transfer-Encoding？我已经尝试了所有这些,但它没有改变结果.
Content-Type: text/plain对于XML附件是否正确？
还有其他建议吗？

Answer 1

tri*_*eee 5

By specifying text/plain you basically surrender control to the remote client's text-handling abilities, which are apparently limited in this particular case. XML is Unicode by spec, so by choosing a better content-type, you are more likely to succeed. Try text/xml or application/xml instead, or even the completely opaque application/octet-stream, which should only allow the recipient to save it on disk in byte-for-byte identical form.

The content transfer encoding should not affect this behavior at all, but since you seem to be unclear on its significance, here is a brief discussion.

The content-transfer-encoding is completely transparent; it will not affect what is delivered or what the remote client can do with it. Which content transfer encoding to choose depends on the nature of your data and the capabilities of the email system which it needs to be transported through. If it's not 8-bit clean, you need a 7-bit CTE to encapsulate it into. If the content has lines which are too long to fit into SMTP, it needs to be encapsulated into something with shorter lines. But the remote client will extract whatever is inside the encapsulation at the other end. Use whatever circumstances dictate.

There is a hierarchy of content transfer encodings for different circumstances:

7bit is appropriate if your data is completely 7-bit ASCII and has no lines longer than approximately 990 characters. Then it can survive even a crude old SMTP transfer without modification. In the absence of any explicit Content-Transfer-Encoding: header, this is the default according to the standards (although you frequently see stuff with 8-bit data in it without an explicit CTE, or even with an explicit 7bit declaration).
8bit relaxes the requirement for the data to be 7-bit clean. If all systems which transport this message support the ESMTP 8BITMIME extension, this should be fine for data with restricted line lengths.
binary additionally allows for unlimited line length. In theory, you should be able to use this to pass through unrestricted content, but in practice, this seems to trigger glitches when systems don't strictly adhere to specifications. A typical symptom is that overlong lines are truncated or folded in transit, violating the integrity of the payload. To avoid problems like that (and to better adhere to the letter and the spirit of the standards for interoperability) you're better off with one of the following.
base64 accepts unrestricted content, but encodes it in a format which meets strict requirements for restricted line length and a severely constrained 7-bit character repertoire. It expands the payload to a bit more than 4/3 of the original size. Example:

    ugqcA7R5cPq667vNaSifRUH9HsW00NqZ1gwICk0pNrUkXFpNIFOpbf3o
    5ml8cqqSygkp8KBgPbHrqnDXvZTEBOkNo7ThE+BAvexa75Tm0Ebo/Yjl
    y697pMp1+dnSlk3YTqxkPI9vqpple13dXLHlvnFDmSi0gqIMSwo7kUFD
    SivAWhyCBR6tFO3lY1Pk6lz78+zgL28VthI72kVRkrWWtzoFef/4u5Ip
    GR00CtsNNEJo01GAQGpkTNFT9U9Q/UI9CMGgaI9E9RkMaTDTQICBEyaE
    woSCQOrNGA==

Run Code Online (Sandbox Code Playgroud)

quoted-printable similarly accepts arbitrary content, but encodes selected bytes to 3x the original. When most of the input is ASCII, this is a tolerable amount of overhead. In other words, this is suitable for roughly textual format with occasional non-ASCII content, such as text in many Western languages using an 8-bit encoding, or formats like HTML where the ASCII markup dominates over the actual content, in pretty much any language. Example:

    <?xml version=3D"1.0" encoding=3D"UTF-8"?>h=C3=ABll=C3=B6 =
    w=C3=B6rld

Run Code Online (Sandbox Code Playgroud)

Quoted printable is not hard to implement at all, and would seem suitable for your scenario.

All of this is codified in the MIME RFCs 2045 through 2048. Wikipedia has nice readable articles about e.g. base64 and quoted-printable.

It's not clear from your description whether you just declared your content to be quoted-printable, or actually encoded it. I've seen people do the former and act surprised when it didn't work, but hope you did the latter. Just a cautionary tale.

归档时间：	9 年，8 月前
查看次数：	2951 次
最近记录：	9 年，8 月前