将Unicode(UTF-8)代码点转换为字节

Question

将Unicode(UTF-8)代码点转换为字节

我去搜索C源代码,但我找不到这个功能,我真的不想自己写一个因为它绝对必须在那里.

详细说明:Unicode点表示为U + ######## - 这很容易获得,我需要的是字符写入文件的格式(例如).Unicode代码点转换为字节,使得最右边的字节的7位写入第一个字节,然后将下一个位的6位写入下一个字节,依此类推.Emacs当然知道如何做到这一点,但是我无法找到将UTF-8编码字符串的字节序列作为字节序列(每个包含8位).

函数,例如get-byte或multybite-char-to-unibyte仅适用于可以使用不超过8位表示的字符.我需要做同样的事情get-byte,但是对于多字节字符,所以我不是接收整数0..256,而是接收整数0..256或单个长整数0..2 ^ 32的向量.

编辑

以防万一有人以后需要这个:

(defun haxe-string-to-x-string (s)
  (with-output-to-string
    (let (current parts)
      (dotimes (i (length s))
        (if (> 0 (multibyte-char-to-unibyte (aref s i)))
            (progn
              (setq current (encode-coding-string
                             (char-to-string (aref s i)) 'utf-8))
              (dotimes (j (length current))
                (princ (format "\\x%02x" (aref current j)))))
          (princ (format "\\x%02x" (aref s i))))))))

Run Code Online (Sandbox Code Playgroud)

Answer 1

leg*_*cia 5

encode-coding-string 可能是你在寻找的东西:

*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (encode-coding-string "e?o?an?o ?iu?a?de" 'utf-8)
"e\304\245o\305\235an\304\235o \304\211iu\304\265a\305\255de"

Run Code Online (Sandbox Code Playgroud)

它返回一个字符串,但您可以使用以下命令访问各个字节aref:

ELISP> (aref (encode-coding-string "e?o?an?o ?iu?a?de" 'utf-8) 1)
196
ELISP> (format "%o" 196)
"304"

Run Code Online (Sandbox Code Playgroud)

或者如果你不介意使用cl功能,那么concatenate你的朋友是:

ELISP> (concatenate 'list (encode-coding-string "e?o?an?o ?iu?a?de" 'utf-8))
(101 196 165 111 197 157 97 110 196 157 111 32 196 137 105 117 196 181 97 197 173 100 101)

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，1 月前
查看次数：	941 次
最近记录：	14 年，1 月前