如何判断字符串是否为base64

Question

如何判断字符串是否为base64

我有很多来自不同来源的电子邮件.它们都有附件,其中许多都有中文附件名称,所以这些名称由他们的电子邮件客户端转换为base64.

当我收到这些电子邮件时,我想解码这个名字.但还有其他名称不是base64.如何使用jython编程语言区分字符串是否为base64 ？

IE浏览器.

第一个附件:

------=_NextPart_000_0091_01C940CC.EF5AC860
Content-Type: application/vnd.ms-excel;
 name="Copy of Book1.xls"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="Copy of Book1.xls"

Run Code Online (Sandbox Code Playgroud)

第二附件:

------=_NextPart_000_0091_01C940CC.EF5AC860
Content-Type: application/vnd.ms-excel;
 name="=?gb2312?B?uLGxvmhlbrixsb5nLnhscw==?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="=?gb2312?B?uLGxvmhlbrixsb5nLnhscw==?="

Run Code Online (Sandbox Code Playgroud)

请注意," Content-Transfer-Encoding "都有base64

Answer 1

Tom*_*lak 21

标头值告诉你:

=?gb2312?B?uLGxvmhlbrixsb5nLnhscw==?=

"=?"     introduces an encoded value
"gb2312" denotes the character encoding of the original value
"B"      denotes that B-encoding (equal to Base64) was used (the alternative 
         is "Q", which refers to something close to quoted-printable)
"?"      functions as a separator
"uLG..." is the actual value, encoded using the encoding specified before
"?="     ends the encoded value

所以拆分"？" 实际上得到你(JSON表示法)

["=", "gb2312", "B", "uLGxvmhlbrixsb5nLnhscw==", "="]

在结果数组中,如果"B"在位置2上,则在第3位面对基数为64的编码字符串.解码后,请务必注意位置1上的编码,可能最好转换整个事情使用那个信息到UTF-8.

还有另外一种方法吗？;-) (2认同)

Answer 2

bob*_*nce 12

请注意,两者Content-Transfer-Encoding都有base64

在这种情况下不相关,Content-Transfer-Encoding仅适用于身体有效负载,而不适用于标头.

=?gb2312?B?uLGxvmhlbrixsb5nLnhscw==?=

Run Code Online (Sandbox Code Playgroud)

这是一个RFC2047编码的头部原子.解码它的stdlib函数是email.header.decode_header.它仍然需要一些后处理来解释该函数的结果,但:

import email.header
x= '=?gb2312?B?uLGxvmhlbrixsb5nLnhscw==?='
try:
    name= u''.join([
        unicode(b, e or 'ascii') for b, e in email.header.decode_header(x)
    ])
except email.Errors.HeaderParseError:
    pass # leave name as it was

Run Code Online (Sandbox Code Playgroud)

然而...

Content-Type: application/vnd.ms-excel;
 name="=?gb2312?B?uLGxvmhlbrixsb5nLnhscw==?="

Run Code Online (Sandbox Code Playgroud)

这是完全错误的.邮件是什么创建的？RFC2047编码只能在原子中发生,而引用的字符串不是原子.RFC2047§5明确否认:

"编码字"绝不能出现在"引用字符串"中.

当长字符串或Unicode字符存在时,接受的编码参数头的方法是RFC2231,这是一个全新的伤害包.但是你应该使用一个标准的邮件解析库来处理这个问题.

因此,'=?'如果需要,您可以检测文件名参数,并尝试通过RFC2047对其进行解码.但是,严格说来正确的事情就是把邮件带到它的话,并真正调用文件=?gb2312?B?uLGxvmhlbrixsb5nLnhscw==?=!

Answer 3

Cor*_*ger 7

@gnud,@ edg - 除非我误解,否则他会询问文件名,而不是文件内容@setori - Content-Trasfer-Encoding告诉你如何编码文件的内容,而不是"文件名".

我不是专家,但文件名中的这一部分告诉他下面的字符:

=？GB2312 2 B 2

我正在寻找RFC中的文档......啊!这是:http://tools.ietf.org/html/rfc2047

RFC说:

通常,"编码字"是一系列可打印的ASCII字符,以"=？"开头,以"？="结尾,中间有两个"？".

要查看的其他内容是SharpMimeTools中的代码,我在我的bug跟踪应用程序中使用的MIME解析器(在C#中),BugTracker.NET

归档时间：	16 年，10 月前
查看次数：	9865 次
最近记录：	15 年，3 月前