Nic*_*ong 4 java http internationalization
我正在使用 Apache HTTP 组件 (4.1-alpha2) 将文件上传到保管箱。这是使用多部分表单数据完成的。以包含国际(非 ascii)字符的多部分形式对文件名进行编码的正确方法是什么?
如果我使用那里的标准 API,服务器会返回一个 HTTP 状态禁止。如果我修改上传代码,则文件名是 urlencoded:
MultipartEntity entity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE);
FileBody bin = new FileBody(file_obj, URLEncoder.encode(file_obj.getName(), HTTP.UTF_8), HTTP.UTF_8, HTTP.OCTET_STREAM_TYPE );
entity.addPart("file", bin);
req.setEntity(entity);
Run Code Online (Sandbox Code Playgroud)
文件已上传,但我最终得到的文件名仍然是编码的。例如 %D1%82%D0%B5%D1%81%D1%82.txt
To solve this issue specifically for the dropbox server I had to encode the filename in utf8. To do this I had to declare my multipart entity as follows:
MultipartEntity entity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE, null, Charset.forName(HTTP.UTF_8));
I was getting the forbidden because of the OAuth signed entity not matching the actual entity sent (it was being URL encoded).
For those interested on what the standards have to say on this I did some reading of RFCs. If the standard is strictly adhered then all headers should be encoded 7bit, this would make utf8 encoding of the filename illegal. However RFC2388 () states:
The original local file name may be supplied as well, either as a "filename" parameter either of the "content-disposition: form-data" header or, in the case of multiple files, in a "content-disposition: file" header of the subpart. The sending application MAY supply a file name; if the file name of the sender's operating system is not in US-ASCII, the file name might be approximated, or encoded using the method of RFC 2231.
许多帖子提到使用 rfc2231 或 rfc2047 对 7 位非 US-ASCII 中的标头进行编码。然而,rfc2047 在第 5.3 节中明确指出,不得在 Content-Disposition 字段上使用编码字。这只会留下 rfc2231,但这是一个扩展,不能依赖于在所有服务器中实现。现实情况是,大多数主要浏览器都以 UTF-8 格式发送非 US-ASCII 字符(因此 Apache HTTP 客户端中的 HttpMultipartMode.BROWSER_COMPATIBLE 模式),因此大多数 Web 服务器将支持这一点。另一件需要注意的事情是,如果你在 multipart 实体上使用 HttpMultipartMode.STRICT,库实际上会用非 ASCII 替换 filename.S 中的问号 (?)