Haskell性能:组合与应用程序？

Question

Haskell性能:组合与应用程序？

Rar*_*ima 7 f# haskell functional-programming

我看到了一些关于功能组合和应用程序之间的相同点和不同点以及各种方法的问题,但有一件事开始让我感到困惑(就我搜索而言还没有问过)是关于它的区别.性能.

当我学习F#时,我爱上了管道操作员|>,它在haskell的反向应用中具有相同的功能&.但是在我看来,F#变体无疑是更美丽的(我不认为我是唯一的).

现在,人们可以轻易地将管道操作员破解为haskell:

(|>) x f = f x

Run Code Online (Sandbox Code Playgroud)

它就像一个魅力!问题解决了!

管道(F#和我们的haskell技巧)之间的巨大差异在于它不构成函数,它基于函数应用程序.它取左边的值并将其传递给右边的函数,而不是合成,它接受2个函数并返回另一个函数,然后可以将其用作任何常规函数.

这至少对我来说,使得代码更漂亮,因为你只使用一个运算符来引导整个函数从参数到最终值的信息流,因为基本的组合(或>>>)你不能给它一个值左侧让它通过"链条".

但从性能的角度来看,看看这些一般选项,结果应该完全相同:

f x = x |> func1 |> func2 |> someLambda |> someMap |> someFold |> show

f x = x & (func1 >>> func2 >>> someLambda >>> someMap >>> someFold >>> show)

f x = (func1 >>> func2 >>> someLambda >>> someMap >>> someFold >>> show) x

Run Code Online (Sandbox Code Playgroud)

哪一个是最快的,一个是基于重复应用还是一个基于组合和单个应用程序？

Answer 1

Zet*_*eta 9

不应该有任何差别,只要(|>)和(>>>)获得内联.让我们编写一个使用四种不同函数的示例,两种是F#样式,两种是Haskell样式:

import Data.Char (isUpper)

{-# INLINE (|>) #-}
(|>) :: a -> (a -> b) -> b
(|>) x f = f x

{-# INLINE (>>>) #-}
(>>>) :: (a -> b) -> (b -> c) -> a -> c
(>>>) f g x = g (f x)

compositionF :: String -> String
compositionF = filter isUpper >>> length >>> show 

applicationF :: String -> String
applicationF x = x |> filter isUpper |> length |> show 

compositionH :: String -> String
compositionH = show . length . filter isUpper

applicationH :: String -> String
applicationH x = show $ length $ filter isUpper $ x

main :: IO ()
main = do
  getLine >>= putStrLn . compositionF  -- using the functions
  getLine >>= putStrLn . applicationF  -- to make sure that
  getLine >>= putStrLn . compositionH  -- we actually get the
  getLine >>= putStrLn . applicationH  -- corresponding GHC core

Run Code Online (Sandbox Code Playgroud)

如果我们编译我们的代码,-ddump-simpl -dsuppress-all -O0我们得到:

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 82, types: 104, coercions: 0}

-- RHS size: {terms: 9, types: 11, coercions: 0}
>>>_rqe
>>>_rqe =
  \ @ a_a1cE @ b_a1cF @ c_a1cG f_aqr g_aqs x_aqt ->
    g_aqs (f_aqr x_aqt)

-- RHS size: {terms: 2, types: 0, coercions: 0}
$trModule1_r1gR
$trModule1_r1gR = TrNameS "main"#

-- RHS size: {terms: 2, types: 0, coercions: 0}
$trModule2_r1h6
$trModule2_r1h6 = TrNameS "Main"#

-- RHS size: {terms: 3, types: 0, coercions: 0}
$trModule
$trModule = Module $trModule1_r1gR $trModule2_r1h6

-- RHS size: {terms: 58, types: 73, coercions: 0}
main
main =
  >>
    $fMonadIO
    (>>=
       $fMonadIO
       getLine
       (. putStrLn
          (>>>_rqe
             (>>>_rqe (filter isUpper) (length $fFoldable[]))
             (show $fShowInt))))
    (>>
       $fMonadIO
       (>>=
          $fMonadIO
          getLine
          (. putStrLn
             (\ x_a10M ->
                show $fShowInt (length $fFoldable[] (filter isUpper x_a10M)))))
       (>>
          $fMonadIO
          (>>=
             $fMonadIO
             getLine
             (. putStrLn
                (. (show $fShowInt) (. (length $fFoldable[]) (filter isUpper)))))
          (>>=
             $fMonadIO
             getLine
             (. putStrLn
                (\ x_a10N ->
                   show $fShowInt (length $fFoldable[] (filter isUpper x_a10N)))))))

-- RHS size: {terms: 2, types: 1, coercions: 0}
main
main = runMainIO main

Run Code Online (Sandbox Code Playgroud)

因此>>>,如果我们不启用优化,则不会内联.但是,如果我们启用优化,您将看不到>>>或根本没有(.).我们的功能略有不同,因为(.)在那个阶段没有内联,但这有点预期.

如果我们添加{-# NOINLINE … #-}到我们的函数并启用优化,我们会发现四个函数根本不会有所不同:

$ ghc -ddump-simpl -dsuppress-all -O2 Example.hs
[1 of 1] Compiling Main             ( Example.hs, Example.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 261, types: 255, coercions: 29}

-- RHS size: {terms: 2, types: 0, coercions: 0}
$trModule2
$trModule2 = TrNameS "main"#

-- RHS size: {terms: 2, types: 0, coercions: 0}
$trModule1
$trModule1 = TrNameS "Main"#

-- RHS size: {terms: 3, types: 0, coercions: 0}
$trModule
$trModule = Module $trModule2 $trModule1

Rec {
-- RHS size: {terms: 29, types: 20, coercions: 0}
$sgo_r574
$sgo_r574 =
  \ sc_s55y sc1_s55x ->
    case sc1_s55x of _ {
      [] -> I# sc_s55y;
      : y_a2j9 ys_a2ja ->
        case y_a2j9 of _ { C# c#_a2hF ->
        case {__pkg_ccall base-4.9.1.0 u_iswupper Int#
                                     -> State# RealWorld -> (# State# RealWorld, Int# #)}_a2hE
               (ord# c#_a2hF) realWorld#
        of _ { (# ds_a2hJ, ds1_a2hK #) ->
        case ds1_a2hK of _ {
          __DEFAULT -> $sgo_r574 (+# sc_s55y 1#) ys_a2ja;
          0# -> $sgo_r574 sc_s55y ys_a2ja
        }
        }
        }
    }
end Rec }

-- RHS size: {terms: 15, types: 14, coercions: 0}
applicationH
applicationH =
  \ x_a12X ->
    case $sgo_r574 0# x_a12X of _ { I# ww3_a2iO ->
    case $wshowSignedInt 0# ww3_a2iO []
    of _ { (# ww5_a2iS, ww6_a2iT #) ->
    : ww5_a2iS ww6_a2iT
    }
    }

Rec {
-- RHS size: {terms: 29, types: 20, coercions: 0}
$sgo1_r575
$sgo1_r575 =
  \ sc_s55r sc1_s55q ->
    case sc1_s55q of _ {
      [] -> I# sc_s55r;
      : y_a2j9 ys_a2ja ->
        case y_a2j9 of _ { C# c#_a2hF ->
        case {__pkg_ccall base-4.9.1.0 u_iswupper Int#
                                     -> State# RealWorld -> (# State# RealWorld, Int# #)}_a2hE
               (ord# c#_a2hF) realWorld#
        of _ { (# ds_a2hJ, ds1_a2hK #) ->
        case ds1_a2hK of _ {
          __DEFAULT -> $sgo1_r575 (+# sc_s55r 1#) ys_a2ja;
          0# -> $sgo1_r575 sc_s55r ys_a2ja
        }
        }
        }
    }
end Rec }

-- RHS size: {terms: 15, types: 15, coercions: 0}
compositionH
compositionH =
  \ x_a1jF ->
    case $sgo1_r575 0# x_a1jF of _ { I# ww3_a2iO ->
    case $wshowSignedInt 0# ww3_a2iO []
    of _ { (# ww5_a2iS, ww6_a2iT #) ->
    : ww5_a2iS ww6_a2iT
    }
    }

Rec {
-- RHS size: {terms: 29, types: 20, coercions: 0}
$sgo2_r576
$sgo2_r576 =
  \ sc_s55k sc1_s55j ->
    case sc1_s55j of _ {
      [] -> I# sc_s55k;
      : y_a2j9 ys_a2ja ->
        case y_a2j9 of _ { C# c#_a2hF ->
        case {__pkg_ccall base-4.9.1.0 u_iswupper Int#
                                     -> State# RealWorld -> (# State# RealWorld, Int# #)}_a2hE
               (ord# c#_a2hF) realWorld#
        of _ { (# ds_a2hJ, ds1_a2hK #) ->
        case ds1_a2hK of _ {
          __DEFAULT -> $sgo2_r576 (+# sc_s55k 1#) ys_a2ja;
          0# -> $sgo2_r576 sc_s55k ys_a2ja
        }
        }
        }
    }
end Rec }

-- RHS size: {terms: 15, types: 15, coercions: 0}
compositionF
compositionF =
  \ x_a1jF ->
    case $sgo2_r576 0# x_a1jF of _ { I# ww3_a2iO ->
    case $wshowSignedInt 0# ww3_a2iO []
    of _ { (# ww5_a2iS, ww6_a2iT #) ->
    : ww5_a2iS ww6_a2iT
    }
    }

Rec {
-- RHS size: {terms: 29, types: 20, coercions: 0}
$sgo3_r577
$sgo3_r577 =
  \ sc_s55d sc1_s55c ->
    case sc1_s55c of _ {
      [] -> I# sc_s55d;
      : y_a2j9 ys_a2ja ->
        case y_a2j9 of _ { C# c#_a2hF ->
        case {__pkg_ccall base-4.9.1.0 u_iswupper Int#
                                     -> State# RealWorld -> (# State# RealWorld, Int# #)}_a2hE
               (ord# c#_a2hF) realWorld#
        of _ { (# ds_a2hJ, ds1_a2hK #) ->
        case ds1_a2hK of _ {
          __DEFAULT -> $sgo3_r577 (+# sc_s55d 1#) ys_a2ja;
          0# -> $sgo3_r577 sc_s55d ys_a2ja
        }
        }
        }
    }
end Rec }

-- RHS size: {terms: 15, types: 14, coercions: 0}
applicationF
applicationF =
  \ x_a12W ->
    case $sgo3_r577 0# x_a12W of _ { I# ww3_a2iO ->
    case $wshowSignedInt 0# ww3_a2iO []
    of _ { (# ww5_a2iS, ww6_a2iT #) ->
    : ww5_a2iS ww6_a2iT
    }
    }
...

Run Code Online (Sandbox Code Playgroud)

所有go函数都完全相同(没有变量名称),并且与之application*相同composition*.所以继续在Haskell中创建自己的F#前奏,不应该有任何性能问题.

Answer 2

Jus*_*mer 5

我的答案是关于F#.

在大多数情况下,F#编译器能够将管道优化为相同的代码:

let f x = x |> (+) 1 |> (*) 2 |> (+) 2
let g x = x |> ((+) 1 >> (*) 2 >> (+) 2)

Run Code Online (Sandbox Code Playgroud)

反编译f,g我们看到编译器达到了相同的结果:

public static int f(int x)
{
    return 2 + 2 * (1 + x);
}
public static int g(int x)
{
    return 2 + 2 * (1 + x);
}

Run Code Online (Sandbox Code Playgroud)

但它似乎并不像我们所看到的那样略微更先进的管道:

let f x = x |>  Array.map add1 |> Array.map mul2 |> Array.map add2 |> Array.reduce (+)
let g x = x |> (Array.map add1 >> Array.map mul2 >> Array.map add2 >> Array.reduce (+))

Run Code Online (Sandbox Code Playgroud)

反编译显示了一些差异:

public static int f(int[] x)
{
  FSharpFunc<int, FSharpFunc<int, int>> arg_25_0 = new Program.f@9();
  if (x == null)
  {
    throw new ArgumentNullException("array");
  }
  int[] array = new int[x.Length];
  FSharpFunc<int, FSharpFunc<int, int>> fSharpFunc = arg_25_0;
  for (int i = 0; i < array.Length; i++)
  {
    array[i] = x[i] + 1;
  }
  FSharpFunc<int, FSharpFunc<int, int>> arg_6C_0 = fSharpFunc;
  int[] array2 = array;
  if (array2 == null)
  {
    throw new ArgumentNullException("array");
  }
  array = new int[array2.Length];
  fSharpFunc = arg_6C_0;
  for (int i = 0; i < array.Length; i++)
  {
    array[i] = array2[i] * 2;
  }
  FSharpFunc<int, FSharpFunc<int, int>> arg_B3_0 = fSharpFunc;
  int[] array3 = array;
  if (array3 != null)
  {
    array2 = new int[array3.Length];
    fSharpFunc = arg_B3_0;
    for (int i = 0; i < array2.Length; i++)
    {
      array2[i] = array3[i] + 2;
    }
    return ArrayModule.Reduce<int>(fSharpFunc, array2);
  }
  throw new ArgumentNullException("array");
}

public static int g(int[] x)
{
  FSharpFunc<int[], int[]> f = new Program.g@10-1();
  FSharpFunc<int[], int[]> fSharpFunc = new Program.g@10-3(f);
  FSharpFunc<int, FSharpFunc<int, int>> reduction = new Program.g@10-4();
  int[] array = fSharpFunc.Invoke(x);
  return ArrayModule.Reduce<int>(reduction, array);
}

Run Code Online (Sandbox Code Playgroud)

对于fF#内联管道,除了最终的reduce.

为g构造然后调用管道.这意味着g可能比某些程度更慢,内存更密集f.

在这个特定的例子中,它可能并不重要,因为我们正在创建数组对象并迭代它们,但如果组成的函数在CPU和内存方面非常便宜,那么建立和调用管道的成本可能是相关的.

如果关键性能对您很重要,我建议您使用一个好的反编译工具来确保生成的代码不会产生意外开销.否则你可能没办法.

`如果关键性能对您很重要,我建议您使用一个好的反编译工具来确保生成的代码不包含意外的开销 - 然后清楚地说明为什么代码是这样的,所以有人不这样做过分重构它. (2认同)

归档时间：	8 年，2 月前
查看次数：	268 次
最近记录：	8 年，2 月前