Kol*_*Kir 4 memory string haskell
我在Haskell编写了一个程序,它必须加载并解析UTF8中的大文本文件.该文件表示每行上具有键:值对的字典.在我的程序中,我想要一个Data.Map容器来进行快速字典搜索.我的文件大约是40MB,但在加载到我的程序后,使用了1.5 GB的RAM,并且从未释放过.我做错了什么?是否预期使用内存?
这是我的程序中的代码示例:
模块主要在哪里
import Engine
import Codec.Archive.Zip
import Data.IORef
import System.IO
import System.Directory
import qualified System.IO.UTF8 as UTF8
import qualified Data.ByteString.Lazy as B
import qualified Data.ByteString.UTF8 as BsUtf
import qualified Data.Map as Map
import Graphics.UI.Gtk
import Graphics.UI.Gtk.Glade
maybeRead :: Read a => BsUtf.ByteString -> Maybe a
maybeRead s = case reads $ BsUtf.toString s of
[(x, "")] -> Just x
_ -> Nothing
parseToEntries :: [BsUtf.ByteString] -> [(BsUtf.ByteString, Int)]
parseToEntries [] = []
parseToEntries (x:xs) = let (key, svalue) = BsUtf.break (==':') x
value = maybeRead svalue
in case value of
Just x -> [(key, x)] ++ parseToEntries xs
Nothing -> parseToEntries xs
createDict :: BsUtf.ByteString -> IO (Map.Map BsUtf.ByteString Int)
createDict str = do
let entries = parseToEntries $ BsUtf.lines str
dict = Map.fromList entries
return (dict)
main :: IO ()
main = do
currFileName <- newIORef ""
dictZipFile <- B.readFile "data.db"
extractFilesFromArchive [] $ toArchive dictZipFile
dictFile <- UTF8.readFile "dict.txt"
dict <- createDict $ BsUtf.fromString dictFile
...
searchAccent :: Map.Map BsUtf.ByteString Int -> String -> Int
searchAccent dict word = let sword = BsUtf.fromString $ map toLower word
entry = Map.lookup sword dict
in case entry of
Nothing -> -1
Just match -> 0
Run Code Online (Sandbox Code Playgroud)
快速回答.
主要问题是System.IO.UTF8.readFile将文件读入String.
假设的瓶颈在这里:
dictFile <- UTF8.readFile "dict.txt"
dict <- createDict $ BsUtf.fromString dictFile
Run Code Online (Sandbox Code Playgroud)
在处理UTF-8文本时,最好使用Data.Text而不是ByteString.尝试这样的事情:
import qualified Data.Text.Lazy as LT
import qualified Data.Text.Lazy.Encoding as LT
...
dictFile <- B.readFile "dict.txt"
dict <- createDict $ LT.decodeUtf8 dictFile
Run Code Online (Sandbox Code Playgroud)
另一个瓶颈是解析数字:你正在转换ByteString为String然后转换read它.最好使用Data.Text.Lazy.Read:
import qualified Data.Text.Lazy.Read as LT
maybeRead :: LT.Text -> Maybe Int
maybeRead s = case LT.decimal s of
Left _ -> Nothing
Right i -> Just i
Run Code Online (Sandbox Code Playgroud)