如何在Common Lisp中将字节数组转换为字符串?

Ken*_*Ken 21 sbcl common-lisp

我正在调用一个有趣的API来返回一个字节数组,但我想要一个文本流.有没有一种从字节数组中获取文本流的简单方法?现在我只是聚在一起:

(defun bytearray-to-string (bytes)
  (let ((str (make-string (length bytes))))
    (loop for byte across bytes
       for i from 0
       do (setf (aref str i) (code-char byte)))
    str))
Run Code Online (Sandbox Code Playgroud)

然后用in-input-from-string包装结果,但这不是最好的方法.(另外,它非常低效.)

在这种情况下,我知道它总是ASCII,因此将其解释为ASCII或UTF-8就可以了.我正在使用支持Unicode的SBCL,但我更喜欢便携式(甚至是纯ASCII)解决方案,而不是特定于SBCL-Unicode的解决方案.

dmi*_*_vk 31

FLEXI-STREAMS(http://weitz.de/flexi-streams/)具有便携式转换功能

(flexi-streams:octets-to-string #(72 101 108 108 111) :external-format :utf-8)

=>

"Hello"
Run Code Online (Sandbox Code Playgroud)

或者,如果你想要一个流:

(flexi-streams:make-flexi-stream
   (flexi-streams:make-in-memory-input-stream
      #(72 101 108 108 111))
   :external-format :utf-8)
Run Code Online (Sandbox Code Playgroud)

将返回一个从byte-vector读取文本的流


Dav*_*lau 24

此转换有两个可移植库:

  • flexi-streams,已在另一个答案中提到过.

    此库较旧且具有更多功能,尤其是可扩展流.

  • Babel,一个专门用于字符编码和解码的库

    Babel优于flexi-stream的主要优点是速度.

为了获得最佳性能,请使用Babel(如果它具有您需要的功能),否则返回弹性流.下面是一个(略显不科学的)微基准图,说明速度差异.

对于这个测试用例,Babel的速度提高了337倍,内存需要减少200倍.

(asdf:operate 'asdf:load-op :flexi-streams)
(asdf:operate 'asdf:load-op :babel)

(defun flexi-streams-test (bytes n)
  (loop
     repeat n
     collect (flexi-streams:octets-to-string bytes :external-format :utf-8)))

(defun babel-test (bytes n)
  (loop
     repeat n
     collect (babel:octets-to-string bytes :encoding :utf-8)))

(defun test (&optional (data #(72 101 108 108 111))
                       (n 10000))
  (let* ((ub8-vector (coerce data '(simple-array (unsigned-byte 8) (*))))
         (result1 (time (flexi-streams-test ub8-vector n)))
         (result2 (time (babel-test ub8-vector n))))
    (assert (equal result1 result2))))

#|
CL-USER> (test)
Evaluation took:
  1.348 seconds of real time
  1.328083 seconds of user run time
  0.020002 seconds of system run time
  [Run times include 0.12 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  126,402,160 bytes consed.
Evaluation took:
  0.004 seconds of real time
  0.004 seconds of user run time
  0.0 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  635,232 bytes consed.
|#
Run Code Online (Sandbox Code Playgroud)


Vat*_*ine 15

如果您不必担心UTF-8编码(实际上,这意味着"只是普通的ASCII"),您可以使用MAP:

(map'string#'code-char#(72 101 108 108 111))

  • 适用于开箱即用的Common Lisp的漂亮解决方案:) (2认同)
  • @ vancan1ty不,这对UTF-8不起作用.例如,`206 177`在UTF-8中编码`"α"`但是`(princ(map'string#'code-char#(206 177)))`在SBCL中返回""α"`. (2认同)

HD.*_*HD. 5

我说与拟议的flexistream或babel解决方案一起使用.

但只是为了完整性和未来googlers到达此页面的好处,我想提到sbcl自己的sb-ext:octets-to-string:

   SB-EXT:OCTETS-TO-STRING is an external symbol in #<PACKAGE "SB-EXT">.
   Function: #<FUNCTION SB-EXT:OCTETS-TO-STRING>
   Its associated name (as in FUNCTION-LAMBDA-EXPRESSION) is
     SB-EXT:OCTETS-TO-STRING.
   The function's arguments are:  (VECTOR &KEY (EXTERNAL-FORMAT DEFAULT) (START 0)
                                          END)
   Its defined argument types are:
     ((VECTOR (UNSIGNED-BYTE 8)) &KEY (:EXTERNAL-FORMAT T) (:START T) (:END T))
   Its result type is:
     *
Run Code Online (Sandbox Code Playgroud)

  • “未来谷歌员工的好处”:八年后,这里是未来的受益者。(duckduckgo,但是嘿) (4认同)