Ale*_*ker 5 lisp unicode sbcl common-lisp utf-8
我有一个代码,如果从emacs内的slime提示执行运行没有错误.如果我从提示符启动sbcl,我收到错误:
* (ei:proc-file "BRAvESP000.log" "lixo")
debugger invoked on a SB-INT:STREAM-ENCODING-ERROR:
:UTF-8 stream encoding error on
#<SB-SYS:FD-STREAM for "file /Users/arademaker/work/IBM/scolapp/lixo"
{10049E8FF3}>:
the character with code 55357 cannot be encoded.
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [OUTPUT-NOTHING ] Skip output of this character.
1: [OUTPUT-REPLACEMENT] Output replacement string.
2: [ABORT ] Exit debugger, returning to top level.
(SB-IMPL::STREAM-ENCODING-ERROR-AND-HANDLE #<SB-SYS:FD-STREAM for "file /Users/arademaker/work/IBM/scolapp/lixo" {10049E8FF3}> 55357)
0]
Run Code Online (Sandbox Code Playgroud)
问题是,在这两种情况下,我使用相同的sbcl 1.1.8和相同的机器,Mac OS 10.8.4.任何的想法?
代码:
(defun proc-file (filein fileout &key (fn-convert #'identity))
(with-open-file (fout fileout
:direction :output
:if-exists :supersede
:external-format :utf8)
(with-open-file (fin filein :external-format :utf8)
(loop for line = (read-line fin nil)
while line
do
(handler-case
(let* ((line (ppcre:regex-replace "^.*{jsonTweet=" line "{\"jsonTweet\":"))
(data (gethash "jsonTweet" (yason:parse line))))
(yason:encode (funcall fn-convert (yason:parse data)) fout)
(format fout "~%"))
(end-of-file ()
(format *standard-output* "Error[~a]: ~a~%" filein line)))))))
Run Code Online (Sandbox Code Playgroud)
这几乎可以肯定是 yason 中的一个错误。JSON 要求如果转义非 BMP 字符,则通过代理对来完成。这是一个使用 U+10000 的简单示例(可以在 json 中将其转义为“\ud800\udc00”;我使用 babel,因为 babel 的转换较少):
(map 'list #'char-code (yason:parse "\"\\ud800\\udc00\""))
=> (55296 56320)
Run Code Online (Sandbox Code Playgroud)
unicode 代码点 55296(十进制)是代理项对的开始,除非作为 UTF-16 中的代理项对,否则不应出现。幸运的是,可以通过使用 babel 将字符串编码为 UTF-16 并再次编码回来来轻松解决此问题:
(babel:octets-to-string (babel:string-to-octets (yason:parse "\"\\ud800\\udc00\"") :encoding :utf-16le) :encoding :utf-16le)
=> ""
Run Code Online (Sandbox Code Playgroud)
您应该可以通过更改此行来解决此问题:
(yason:encode (funcall fn-convert (yason:parse data)) fout)
Run Code Online (Sandbox Code Playgroud)
使用中间字符串,将其转换为 UTF-16 并返回。
(write-sequence
(babel:octets-to-string
(babel:string-to-octets
(with-output-to-string (outs)
(yason:encode (funcall fn-convert (yason:parse data)) outs))
:encoding :utf-16le)
:encoding :utf-16le)
fout)
Run Code Online (Sandbox Code Playgroud)
我提交了一个已被接受的补丁来修复 yason 中的这个问题:
https://github.com/hanshuebner/yason/commit/4a9bdaae652b7ceea79984e0349a992a5458a0dc
| 归档时间: |
|
| 查看次数: |
522 次 |
| 最近记录: |