我试图在一个3.5米的行文件上实现Kosaraju的图算法,其中每行是两个(空格分隔)Ints代表图形边缘.首先,我需要创建一个摘要数据结构,其中包含节点及其传入和传出边的列表.下面的代码实现了这一点,但需要花费一分多钟,而我可以从MOOC论坛上的帖子看到使用其他语言的人在"10"中完成.(getLines
在我读到的基准测试中,与10岁以下相比需要10分钟.)
我是Haskell的新手,已经实现了一种使用累积方法foldl'
(这'
是一种突破性的终止方法),但它在风格方面感觉相当紧迫,我希望这就是它运行缓慢的原因.此外,我目前正计划使用类似的模式进行深度优先搜索,我担心这一切都会变得太慢.
import System.IO
import Control.Monad
import Data.Map.Strict as Map
import Data.List as L
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored (Edges, Edges) deriving (Show)
type Graph1 = Map NodeName Node
getLines :: FilePath -> IO [[Int]]
getLines = liftM (fmap (fmap read . words) . lines) . readFile
getLines' :: FilePath -> IO [(Int,Int)]
getLines' = liftM (fmap (tuplify2 . fmap read . words) . lines) . readFile
tuplify2 :: [a] -> (a,a)
tuplify2 [x,y] = (x,y)
main = do
list <- getLines "testdata.txt" -- [String]
--list <- getLines "SCC.txt" -- [String]
let
list' = createGraph list
return list'
createGraph :: [[Int]] -> Graph1
createGraph xs = L.foldl' build Map.empty xs
where
build :: Graph1-> [Int] -> Graph1
build = \acc (x:y:_) ->
let tmpAcc = case Map.lookup x acc of
Nothing -> Map.insert x (Node False ([y],[])) acc
Just a -> Map.adjust (\(Node _ (fwd, bck)) -> (Node False ((y:fwd), bck))) x acc
in case Map.lookup y tmpAcc of
Nothing -> Map.insert y (Node False ([],[x])) tmpAcc
Just a -> Map.adjust (\(Node _ (fwd, bck)) -> (Node False (fwd, (x:bck)))) y tmpAcc
Run Code Online (Sandbox Code Playgroud)
And*_*ács 10
IntMap
或HashMap
尽可能使用.两者Int
都比键快得多Map
.HashMap
通常比IntMap
使用更多内存更快,但库更少.containers
软件包具有大量专用功能.与问题中alter
的createGraph
实现相比,查找次数可以减半.示例createGraph
:
import Data.List (foldl')
import qualified Data.IntMap.Strict as IM
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = IM.IntMap Node
createGraph :: [(Int, Int)] -> Graph1
createGraph xs = foldl' build IM.empty xs
where
addFwd y (Just (Node _ f b)) = Just (Node False (y:f) b)
addFwd y _ = Just (Node False [y] [])
addBwd x (Just (Node _ f b)) = Just (Node False f (x:b))
addBwd x _ = Just (Node False [] [x])
build :: Graph1 -> (Int, Int) -> Graph1
build acc (x, y) = IM.alter (addBwd x) y $ IM.alter (addFwd y) x acc
Run Code Online (Sandbox Code Playgroud)
generate
,iterate
,constructN
,等等).这些可能在幕后使用突变,但比实际的可变载体使用起来要方便得多.示例createGraph
:
import qualified Data.Vector as V
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = V.Vector Node
createGraph :: Int -> [(Int, Int)] -> Graph1
createGraph maxIndex edges = graph'' where
graph = V.replicate maxIndex (Node False [] [])
graph' = V.accum (\(Node e f b) x -> Node e (x:f) b) graph edges
graph'' = V.accum (\(Node e f b) x -> Node e f (x:b)) graph' (map (\(a, b) -> (b, a)) edges)
Run Code Online (Sandbox Code Playgroud)
请注意,如果节点索引的范围存在间隙,那么任何一个都是明智的
Node
表示缺少的索引.Data.Text
或的IO功能Data.ByteString
.在这两种情况下,还有用于将输入分解为行或单词的有效功能.例:
import qualified Data.ByteString.Char8 as BS
import System.IO
getLines :: FilePath -> IO [(Int, Int)]
getLines path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let pairs = (map . map) (maybe (error "can't read Int") fst . BS.readInt) lines
return [(a, b) | [a, b] <- pairs]
Run Code Online (Sandbox Code Playgroud)
总是这样做,不像我在这个答案中.使用criterion
.