为什么要避免在Haskell中进行显式递归？

Question

为什么要避免在Haskell中进行显式递归？

我是Haskell的新手。

在研究文件夹时，许多人建议使用它，并避免显式递归，这可能导致内存效率低下的代码。 https://www.reddit.com/r/haskell/comments/1nb80j/proper_use_of_recursion_in_haskell/

当我运行上面链接中提到的示例时。我可以看到显式递归在内存方面做得更好。首先，我认为May可以在GHCi上运行还不是完美的基准，因此我尝试使用进行编译stack ghc。顺便说一句，我如何通过堆栈ghc传递编译器优化标志。我在表达式中缺少什么避免显式递归。

find p = foldr go Nothing
    where go x rest = if p x then Just x else rest

findRec :: (a -> Bool) -> [a] -> Maybe a
findRec _ [] = Nothing
findRec p (x:xs) = if p x then Just x else (findRec p xs)

main :: IO ()
main = print $ find (\x -> x `mod` 2 == 0) [1, 3..1000000] 
main = print $ findRec (\x -> x `mod` 2 == 0) [1, 3..1000000] 

-- find
Nothing
      92,081,224 bytes allocated in the heap
           9,392 bytes copied during GC
          58,848 bytes maximum residency (2 sample(s))
          26,704 bytes maximum slop
               0 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        87 colls,     0 par    0.000s   0.000s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.000s   0.001s     0.0004s    0.0008s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.031s  (  0.043s elapsed)
  GC      time    0.000s  (  0.001s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.031s  (  0.044s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    2,946,599,168 bytes per MUT second

  Productivity 100.0% of total user, 96.8% of total elapsed

-- findRec
Nothing
      76,048,432 bytes allocated in the heap
          13,768 bytes copied during GC
          42,928 bytes maximum residency (2 sample(s))
          26,704 bytes maximum slop
               0 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        71 colls,     0 par    0.000s   0.000s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.000s   0.001s     0.0004s    0.0007s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.031s  (  0.038s elapsed)
  GC      time    0.000s  (  0.001s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.031s  (  0.039s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    2,433,549,824 bytes per MUT second

  Productivity 100.0% of total user, 96.6% of total elapsed

Run Code Online (Sandbox Code Playgroud)

Answer 1

ama*_*loy 11

您正在测量GHC可以进行多百万次模数运算的速度。如您所料，无论您如何迭代，“眨眼之间”都是答案。速度没有明显差异。

您声称可以看到显式递归使用的内存更少，但是您提供的堆概要分析数据却显示了相反的情况：使用显式递归时分配更多，最大驻留时间更高。我认为差异不大，但如果是那样，您的证据将与您的主张相抵触。

关于为什么要避免显式递归的问题，目前尚不清楚您所读的线程的哪一部分可以得出结论。您链接到一个巨线程，而该巨线程本身又链接到另一个具有许多竞争观点的巨线程。对我来说，最突出的评论不是效率，而是抽象水平。您正在通过尝试衡量其性能而以错误的方式看待它。

Answer 2

K. *_*uhr 5

首先，不要尝试使用优化编译以外的任何方法来理解 GHC 编译代码的性能：

$ stack ghc -- -O2 Find.hs
$ ./Find +RTS -s

Run Code Online (Sandbox Code Playgroud)

使用-O2标志（和 GHC 版本 8.6.4），您的find执行如下：

      16,051,544 bytes allocated in the heap
          14,184 bytes copied during GC
          44,576 bytes maximum residency (2 sample(s))
          29,152 bytes maximum slop
               0 MB total memory in use (0 MB lost due to fragmentation)

Run Code Online (Sandbox Code Playgroud)

然而，这是非常具有误导性的。这些内存使用都不是由于foldr. 相反，这完全是由于使用了 boxed Integers。如果您切换到使用Ints编译器可以拆箱的plain：

main = print $ find (\x -> x `mod` 2 == 0) [1::Int, 3..1000000]
                                             ^^^^^

Run Code Online (Sandbox Code Playgroud)

内存性能发生了巨大变化，并展示了以下真实内存成本foldr：

      51,544 bytes allocated in the heap
       3,480 bytes copied during GC
      44,576 bytes maximum residency (1 sample(s))
      25,056 bytes maximum slop
           0 MB total memory in use (0 MB lost due to fragmentation)

Run Code Online (Sandbox Code Playgroud)

如果你测试findRec与Ints像这样：

 main = print $ findRec (\x -> x `mod` 2 == 0) [1::Int, 3..1000000]

Run Code Online (Sandbox Code Playgroud)

你会看到更糟糕的内存性能：

  40,051,528 bytes allocated in the heap
      14,992 bytes copied during GC
      44,576 bytes maximum residency (2 sample(s))
      29,152 bytes maximum slop
           0 MB total memory in use (0 MB lost due to fragmentation)

Run Code Online (Sandbox Code Playgroud)

这似乎是一个令人信服的案例，即应该避免递归而不是foldr，但这也非常具有误导性。您在这里看到的不是递归的内存成本，而是“列表构建”的内存成本。

看，foldr表达式[1::Int, 3..1000000]都包含一些称为“列表融合”的魔法。这意味着当它们一起使用时（即，何时foldr应用于[1::Int 3..1000000]），可以执行优化以完全消除创建 Haskell 列表。关键的是foldr，即使使用列表融合，代码也会编译为递归代码，如下所示：

main_go
  = \ x ->
      case gtInteger# x lim of {
        __DEFAULT ->
          case eqInteger# (modInteger x lvl) lvl1 of {
            __DEFAULT -> main_go (plusInteger x lvl);
                      -- ^^^^^^^ - SEE?  IT'S JUST RECURSION
            1# -> Just x
          };
        1# -> Nothing
      }
end Rec }

Run Code Online (Sandbox Code Playgroud)

因此，它是列表融合，而不是“避免递归” find比findRec.

通过考虑以下性能，您可以看出这是真的：

find1 :: Int -> Maybe Int
find1 n | n >= 1000000 = Nothing
        | n `mod` 2 == 0 = Just n
        | otherwise = find1 (n+2)

main :: IO ()
main = print $ find1 1

Run Code Online (Sandbox Code Playgroud)

即使这使用递归，它也不会生成列表（或使用 boxed Integers），因此它的运行就像foldr版本一样：

      51,544 bytes allocated in the heap
       3,480 bytes copied during GC
      44,576 bytes maximum residency (1 sample(s))
      25,056 bytes maximum slop
           0 MB total memory in use (0 MB lost due to fragmentation)

Run Code Online (Sandbox Code Playgroud)

那么，什么是带回家的课程？

始终使用对 Haskell 代码进行基准测试ghc -O2，从不使用GHCi 或ghc不使用优化标志。
在任何 Reddit 帖子中，只有不到 10% 的人知道他们在说什么。
foldr 当可以应用列表融合等特殊优化时，有时可以比显式递归执行得更好。
但在一般情况下，显式递归的性能与foldr其他特殊结构一样好。
此外，优化 Haskell 代码很困难。

实际上，这是一个更好（更严肃）的带回家的课程。尤其是当您开始使用 Haskell 时，请尽一切努力避免考虑“优化”您的代码。与我所知道的任何其他语言相比，您编写的代码和编译器生成的代码之间存在巨大的鸿沟，所以现在甚至不要试图弄清楚。相反，编写清晰、直接和惯用的代码。如果您现在尝试学习高性能代码的“规则”，您将把它们全都弄错，并且学习非常糟糕的编程风格。

事实上，这是一个更好的带回家的教训。特别是当您开始使用 Haskell 时，请尽一切努力避免考虑“优化”您的代码。与我所知道的任何其他语言相比，您编写的代码与编译器生成的代码之间存在巨大的鸿沟，因此现在甚至不要试图弄清楚它。相反，编写清晰、简单且符合习惯的代码。如果你现在尝试学习高性能代码的“规则”，你就会把它们全部弄错，并学到非常糟糕的编程风格。 (4认同)

归档时间：	6 年，4 月前
查看次数：	165 次
最近记录：	6 年，3 月前