解析IMAP时,base64编码的字符串通过fgets调用被截断

Elz*_*ugi 6 php base64 imap stream fgets

我正在使用Zend_Mail解析电子邮件,奇怪的是一些内容在没有明显原因的情况下被截断,并且使电子邮件部分出错.

例如

Content-Disposition: attachment; filename="file.sdv"

DQogICAgICBTT05FO0xBTkRJTkdTREE7U0FMR1NEQVRPIDtOQVNKIDtSRURTS0FQICAgICAgICAg
ICAgIDsgRklTS0VTTEFHO1BSRVNFUlYgICA7ICBUSUxTVEFORDsgU1TYUlJFTFNFOyAgS1ZBTElU
RVQ7T01TVFlQRSAgO01JTlNURVBSSVM7ICAgICBWRVJESTsgICBLVkFOVFVNOyAgUlVORFZFS1Qg
IA0KLS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLS0tLS0t
LS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0t
LS0tOy0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0t
ICANCiAgICAgICAgIDA7MjAxMC4wOS4wODsyMDEwLjA5LjA4O05vcnNrO0dhcm4gICAgICAgICAg
ICAgICAgOyAgICAgIDEwMjE7RkVSU0sgICAgIDsgICAgICAgMjEwOyAgIDQwMjA5OTk7ICAgICAg
ICAyMDtFZ2Vub3ZlcnQ7ICAgICAgICAgIDsgICAzMDcyLDE2OyAgICAgICAyMTE7ICAgICAyNTMs
MiAgDQogICAgICAgICAwOzIwMTAuMDkuMDg7MjAxMC4wOS4wODtOb3JzaztHYXJuICAgICAgICAg
Run Code Online (Sandbox Code Playgroud)

被截断为

Content-Disposition: attachment; filename="file.sdv"

DQogICAgICBTT05FO0xBTkRJTkdTREE7U0FMR1NEQVRPIDtOQVNKIDtSRURTS0FQICAgICAgICAg
ICAgIDsgRklTS0VTTEFHO1BSRVNFUlYgICA7ICBUSUxTVEFORDsgU1TYUlJFTFNFOyAgS1ZBTElU
RVQ7T01TVFlQRSAgO01JTlNURVBSSVM7ICAgICBWRVJESTsgICBLVkFOVFVNOyAgUlVORFZFS1Qg
IA0KLS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLS0tLS0t
LS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0t
LS
Run Code Online (Sandbox Code Playgroud)

每行的var_dump显示了这一点.

string(78) "DQogICAgICBTT05FO0xBTkRJTkdTREE7U0FMR1NEQVRPIDtOQVNKIDtSRURTS0FQICAgICAgICAg
"
string(78) "ICAgIDsgRklTS0VTTEFHO1BSRVNFUlYgICA7ICBUSUxTVEFORDsgU1TYUlJFTFNFOyAgS1ZBTElU
"
string(78) "RVQ7T01TVFlQRSAgO01JTlNURVBSSVM7ICAgICBWRVJESTsgICBLVkFOVFVNOyAgUlVORFZFS1Qg
"
string(78) "IA0KLS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLS0tLS0t
"
string(78) "LS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0t
"
string(5) "LS)
"
string(17) "TAG5 OK Success
"    
Run Code Online (Sandbox Code Playgroud)

或者在其他电子邮箱中

DQogICAgICBTT05FO0xBTkRJTkdTREE7U0FMR1NEQVRPIDtOQVNKIDtSRURTS0FQICAgICAgICAg
ICAgIDsgRklTS0VTTEFHO1BSRVNFUlYgICA7ICBUSUxTVEFORDsgU1TYUlJFTFNFOyAgS1ZBTElU
RVQ7T01TVFlQRSAgO01JTlNURVBSSVM7ICAgICBWRVJESTsgICBLVkFOVFVNOyAgUlVORFZFS1Qg
IA0KLS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLS0tLS0t
LS0tLS07LS0tLS0tLS0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS0tLTstLS0tLS0t
LS0tOy0tLS0tLS0tLTstLS0tLS0tLS0tO
Run Code Online (Sandbox Code Playgroud)

我无法弄清楚为什么要停在那里.传输应该只停留在行尾.这是从IMAP服务器获取字符串的行.

$line = @fgets($this->_socket);
Run Code Online (Sandbox Code Playgroud)

编码文本包含类似的字符串,但同样会在不同电子邮件的各个部分中截断.

----------;----------;----------;-----;--------------------;----------;----------;--
Run Code Online (Sandbox Code Playgroud)

我试图为fgets()添加一个大小,但没有结果.我还启用/禁用了"auto_detect_line_endings"php_ini设置,再次没有结果.

我还用ZF 打开了一个错误报告,尽管错误似乎不在库中.

你看到这个编码的字符串有什么奇怪的吗?

UPDATE

新的研究表明,电子邮件在584个字符后被截断.还是不知道为什么.向谷歌发送了一个问题.看到这里.

电子邮件标头错误:

Delivered-To: email@removed.com
Received: by 10.216.3.208 with SMTP id 58cs248812weh;
    Fri, 20 Nov 2009 05:14:14 -0800 (PST)
Received: by 10.204.153.217 with SMTP id l25mr1285471bkw.108.1258722853863;
    Fri, 20 Nov 2009 05:14:13 -0800 (PST)
Return-Path: <>
Received: from MTX4.mbn1.net (mtx4.mbn1.net [213.188.129.252])
    by mx.google.com with SMTP id 2si1800716bwz.60.2009.11.20.05.14.12;
    Fri, 20 Nov 2009 05:14:13 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of MTX4.mbn1.net designates         213.188.129.252 as permitted sender) client-ip=213.188.129.252;
Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of MTX4.mbn1.net designates 213.188.129.252 as permitted sender) smtp.mail=
Resent-From: <email@removed.com>
Content-Type: multipart/mixed; boundary="===============1703099044=="
MIME-Version: 1.0
From: <email@removed.com>
To: <email@removed.com>
CC:
Subject: some subject
Message-ID: <FLYNDRElQ080Gxw8Zw500000f46email@removed.com>
X-OriginalArrivalTime: 20 Nov 2009 13:14:08.0121 (UTC) FILETIME=[5792C690:01CA69E3]
Date: Fri, 20 Nov 2009 14:14:08 +0100
X-STA-Metric: 0 (engine=030)
X-STA-NotSpam: tlf: vedlagt skip:__ 40 fil cc:2**0
X-STA-Spam: header:MIME-Version: charset:us-ascii header:Subject:1 to:2**0 header:From:1
X-BTI-AntiSpam: score:0,sta:0/030,dnsbl:passed,sw:off,bsn:38/passed,spf:off,bsctr:passed/1,dk:off,pbmf:none,ipr:0/3,trusted:no,ts:no,bs:no,ubl:passed
X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply
Resent-Message-Id: <19740416124736.CF5804B33EF632B0email@removed.com>
Resent-Date: Fri, 20 Nov 2009 14:14:11 +0100 (CET)

--===============1703099044==
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="file.sdv"

DQpHUlVQUEVOQVZOICAgICAgICAgIDtLSthQRTtQUk9EQU5MO1BBS0tFTlI7TU9UVEFLTkFWTiAg
ICAgICAgICAgICAgICAgICAgO1NPTjtMQU5ESU5HU0RBO1NBTEdTREFUTyA7TkFTSiA7UkVEU0tB
UCAgIDtGSVNLRVNMQUcgO1BSRVNFUlYgICA7VElMU1RBTkQ7U1TYUlJFTFM7S1ZBTElURVQ7TUlO
U1RFUFJJUzsgICAgICAgIFZFUkRJOyAgICAgS1ZBTlRVTTsgICAgUlVORFZFS1QgICAgDQotLS0t
LS0tLS0tLS0tLS0tLS0tLTstLS0tLTstLS0tLS0tOy0tLS0tLS07LS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tOy0tLTstLS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS07LS0tLS0tLS0tLTst
LS0tLS0tLS0tOy0tLS0tLS0tLS07LS0tLS0tLS07LS0tLS0tLS07LS0tLS0tLS07LS0tLS0tLS0t
LTstLS0tLS0tLS0tLS0tOy0tLS0tLS0tLS0tLTstLS0tLS0tLS0tLS0gICAgDQpMb3JlbnR6ZW4g
....
Run Code Online (Sandbox Code Playgroud)

对于那些对答案感兴趣而不是(前)赏金的人,更多的线索.

Gmail正在返回一个简短的值,以响应RFC822.SIZE,这可能会导致截断的邮件.(对于每个标题行,它们关闭一个字节,显然不计算CR/LF的两个字符.)

mvd*_*vds 5

我觉得你在找错了地方.

imap服务器为您提供截断的邮件消息,然后返回其状态行TAG5 OK Success.

我没有看到你的(/ php)处理套接字将如何使一些kb的流消失,以便在此状态行之前神奇地修复流.

所以要么消息本身被截断(你有没有通过其他方式验证消息内容?)或者imap服务器刚刚被破坏.

我要做的第一件事是:

  • 找到一个足够安静的环境来放置你的项目,在那里你可以通过strace -f -s 10240 -p <pid>apache的进程来验证套接字交互(假设一个linux/apache环境)
  • 和/或:使用tcpdump,ethereal或相当于检查什么是未来的上线

我的猜测是你会在线上看到完全相同的截断字符串.这意味着您可以将焦点转移到imap服务器.

让自己放心,你在正确的地方寻找可以节省大量的时间.