Haskell 2010语言报告说:
Haskell使用Unicode [2]字符集.但是,源程序目前偏向于早期版本的Haskell中使用的ASCII字符集.
这是否意味着UTF-8?
在ghc-7.0.4/compiler/parser/Lexer.x.source中:
$unispace = \x05 -- Trick Alex into handling Unicode. See alexGetChar.
$whitechar = [\ \n\r\f\v $unispace]
$white_no_nl = $whitechar # \n
$tab = \t
$ascdigit = 0-9
$unidigit = \x03 -- Trick Alex into handling Unicode. See alexGetChar.
$decdigit = $ascdigit -- for now, should really be $digit (ToDo)
$digit = [$ascdigit $unidigit]
$special = [\(\)\,\;\[\]\`\{\}]
$ascsymbol = [\!\#\$\%\&\*\+\.\/\<\=\>\?\@\\\^\|\-\~]
$unisymbol = \x04 -- Trick Alex into handling Unicode. See alexGetChar.
$symbol = [$ascsymbol $unisymbol] # [$special \_\:\"\']
$unilarge = \x01 -- Trick Alex into handling Unicode. See alexGetChar.
$asclarge = [A-Z]
$large = [$asclarge $unilarge]
$unismall = \x02 -- Trick Alex into handling Unicode. See alexGetChar.
$ascsmall = [a-z]
$small = [$ascsmall $unismall \_]
$unigraphic = \x06 -- Trick Alex into handling Unicode. See alexGetChar.
$graphic = [$small $large $symbol $digit $special $unigraphic \:\"\']
Run Code Online (Sandbox Code Playgroud)
......我不知道该怎么做.alexGetChar真的没有用.
Unicode是字符集.UTF-8,UTF-16等是Unicode代码点的具体物理编码.试着在这里阅读.差异在那里解释得很好.
引用报告的部分只是声明Haskell源使用Unicode字符集.它没有说明应该使用哪种编码.换句话说,它说明哪些字符可以出现在源代码中,但没有说明如何用普通字节来编写它们.