haskell - 无效的代码页字节序列

Kar*_*ath 11 haskell file utf-8

readFile "file.html"
"start of the file... *** Exception: file.html: hGetContents: invalid argument (invalid code page byte sequence)
Run Code Online (Sandbox Code Playgroud)

这是用记事本++创建的UTF-8文件...如何在haskell中读取文件?

Dan*_*her 13

默认情况下,文件是在系统区域设置中读取的,因此如果您有使用非标准编码的文件,则需要自己设置文件句柄的编码.

foo = do
    handle <- openFile "file.html" ReadMode
    hSetEncoding handle utf8_bom
    contents <- hGetContents handle
    doSomethingWithContents
    hClose handle
Run Code Online (Sandbox Code Playgroud)

应该让你开始.请注意,这不包含错误处理,因此更好的方法

import Control.Exception -- for bracket

foo = bracket
        (openFile "file.html" ReadMode >>= \h -> hSetEncoding h utf8_bom >> return h)
        hClose
        (\h -> hGetContents h >>= doSomething)
Run Code Online (Sandbox Code Playgroud)

要么

foo = withFile "file.html" ReadMode $
        \h -> do hSetEncoding h utf8_bom
                 contents <- hGetContents h
                 doSomethingWith contents
Run Code Online (Sandbox Code Playgroud)