Haskell 速度问题,执行程序的两个部分比单独执行任一部分花费的时间要长得多

b1g*_*ar5 2 performance haskell ghc

我有一个 Haskell 程序,主要有两行代码:

putStrLn $ "Day11: part1: " ++ show (sum $ bigManhattan 1 galaxies <$> pairs)
putStrLn $ "Day11: part2: " ++ show (sum $ bigManhattan 999999 galaxies <$> pairs)
Run Code Online (Sandbox Code Playgroud)

如果我注释掉其中任何一个,程序将在 0.01 秒内运行。由于两者都存在,该程序需要 90 秒的时间。

我想知道有人有什么想法吗?他们是否会竞争查看数据并互相妨碍?

编译器选项 - 我不知道其中大多数是做什么的......

ghc-options:
- -Wall
- -Wcompat
- -Widentities
- -Wincomplete-record-updates
- -Wincomplete-uni-patterns
- -Wmissing-export-lists
- -Wmissing-home-modules
- -Wpartial-fields
- -Wredundant-constraints
- -O2
Run Code Online (Sandbox Code Playgroud)

代码:

module Day11(day11) where

import Data.List ((\\))
import Data.Maybe (catMaybes)


type Coord = (Int, Int)


manhattan :: Coord -> Coord -> Int
manhattan (x1, y1) (x2, y2) = abs (x1 - x2) + abs (y1 - y2)


getF :: (String -> a) -> Int -> IO a
getF f n = do
  s <- readFile $ "./Data/Day" ++ show n ++ ".in"
  return $ f s


getLines :: Int -> IO [String]
getLines = getF lines


parse :: [String] -> [Coord]
parse css = concatMap (catMaybes . (\(y, cs) -> (\(x, c) -> if c=='#' then Just (x,y) else Nothing) <$> zip [0..] cs)) (zip [0..] css)


nSize :: Int
nSize = 140


bigManhattan :: Int -> [Coord] -> (Coord, Coord) -> Int
bigManhattan k galaxies ((c1, r1), (c2, r2)) = manhattan (c1+newc1, r1+newr1) (c2+newc2, r2+newr2)
  where
    baseC, baseR :: [Int]
    baseC = [0..(nSize-1)] \\ (fst <$> galaxies)
    baseR = [0..(nSize-1)] \\  (snd <$> galaxies)
    newc1, newc2, newr1, newr2 :: Int
    newc1 = k * length (filter (c1>) baseC)
    newc2 = k * length (filter (c2>) baseC)
    newr1 = k * length (filter (r1>) baseR)
    newr2 = k * length (filter (r2>) baseR)


day11 :: IO ()
day11 = do
  ls <- getLines 11
  let galaxies = parse ls
      pairs = [(x,y) | x <- galaxies, y <- galaxies, x<y ]

  putStrLn $ "Day11: part1: " ++ show (sum $ bigManhattan 1 galaxies <$> pairs)
  --putStrLn $ "Day11: part2: " ++ show (sum $ bigManhattan 999999 galaxies <$> pairs)
  return ()
Run Code Online (Sandbox Code Playgroud)

使用 GHC 9.4.7。

K. *_*uhr 5

请注意,baseCbaseR依赖于galaxies但不依赖于pairs。当您仅打印两个结果之一时,GHC 内联所有结果并且bigManhattan能够将. 但是,当您打印这两个结果时,内联将被抑制,并且会针对的每个元素的每次调用重新计算。解决此问题的最简单方法是将和移入以确保每个总和仅计算一次。你可以做得更好,因为它们不依赖于任何一个,但这似乎工作得足够快。baseCbaseRsumbaseCbaseRbigManhattanpairssummapbigManhattanbaseCbaseRk

bigManhattan :: Int -> [Coord] -> [(Coord, Coord)] -> Int
bigManhattan k galaxies = sum . map go
  where
    baseC, baseR :: [Int]
    baseC = [0..(nSize-1)] \\ (fst <$> galaxies)
    baseR = [0..(nSize-1)] \\  (snd <$> galaxies)

    go ((c1,r1),(c2,r2)) = manhattan (c1+newc1, r1+newr1) (c2+newc2, r2+newr2)
      where newc1, newc2, newr1, newr2 :: Int
            newc1 = k * length (filter (c1>) baseC)
            newc2 = k * length (filter (c2>) baseC)
            newr1 = k * length (filter (r1>) baseR)
            newr2 = k * length (filter (r2>) baseR)


day11 :: IO ()
day11 = do
  ls <- getLines 11
  let galaxies = parse ls
      pairs = [(x,y) | x <- galaxies, y <- galaxies, x<y ]

  putStrLn $ "Day11: part1: " ++ show (bigManhattan 1 galaxies pairs)
  putStrLn $ "Day11: part2: " ++ show (bigManhattan 999999 galaxies pairs)
  return ()
Run Code Online (Sandbox Code Playgroud)