尾递归与非尾递归。前者慢吗？

Question

尾递归与非尾递归。前者慢吗？

我正在学习函数式编程和Erlang的基础知识，并且实现了阶乘函数的三个版本：使用带保护的递归，带模式匹配的递归和尾部递归。

我正在尝试比较每个阶乘实现的性能（Erlang / OTP 22 [erts-10.4.1]）：

%% Simple factorial code:
fac(N) when N == 0 -> 1;
fac(N) when N > 0 -> N * fac(N - 1).

%% Using pattern matching:
fac_pattern_matching(0) -> 1;
fac_pattern_matching(N) when N > 0 -> N * fac_pattern_matching(N - 1).

%% Using tail recursion (and pattern matching):
tail_fac(N) -> tail_fac(N, 1).

tail_fac(0, Acc) -> Acc;
tail_fac(N, Acc) when N > 0 -> tail_fac(N - 1, N * Acc).

Run Code Online (Sandbox Code Playgroud)

计时器助手：

-define(PRECISION, microsecond).

execution_time(M, F, A, D) ->
  StartTime = erlang:system_time(?PRECISION),
  Result = apply(M, F, A),
  EndTime = erlang:system_time(?PRECISION),
  io:format("Execution took ~p ~ps~n", [EndTime - StartTime, ?PRECISION]),
  if
    D =:= true -> io:format("Result is ~p~n", [Result]);
    true -> ok
  end
.

Run Code Online (Sandbox Code Playgroud)

执行结果：

递归版本：

3> mytimer:execution_time(factorial, fac, [1000000], false).
Execution took 1253949667 microseconds
ok

Run Code Online (Sandbox Code Playgroud)

具有模式匹配版本的递归：

4> mytimer:execution_time(factorial, fac_pattern_matching, [1000000], false).
Execution took 1288239853 microseconds
ok

Run Code Online (Sandbox Code Playgroud)

尾递归版本：

5> mytimer:execution_time(factorial, tail_fac, [1000000], false).
Execution took 1405612434 microseconds
ok

Run Code Online (Sandbox Code Playgroud)

我原本期望尾递归版本比其他两个版本表现更好，但令我惊讶的是它的性能较差。这些结果与我的预期正好相反。

为什么？

Answer 1

Hyn*_*dil 5

问题在于您选择的功能。阶乘是一个增长非常快的功能。Erlang已经实现了大整数算法，因此不会溢出。您正在有效地衡量基础大整数实现的性能。1000000！是一个巨大的数字。它是8.26×10 ^ 5565708，大约是5.6MB，写为十进制数字。您fac/1与tail_fac/1达到大整数实现的大数字的速度与数字增长的速度之间存在差异。在fac/1实施过程中，您可以有效地进行计算1*2*3*4*...*N。在您的tail_fac/1实现中，您正在计算N*(N-1)*(N-2)*(N-3)*...*1。您看到那里的问题了吗？您可以用其他方式编写尾调用实现：

tail_fac2(N) when is_integer(N), N > 0 ->
    tail_fac2(N, 0, 1).

tail_fac2(X, X, Acc) -> Acc;
tail_fac2(N, X, Acc) ->
    Y = X + 1,
    tail_fac2(N, Y, Y*Acc).

Run Code Online (Sandbox Code Playgroud)

它将更好地工作。我不像您那样耐心，因此我将测出较小的数字，但新的fact:tail_fac2/1防护罩fact:fac/1每次都要胜过：

1> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7743768
2> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7629604
3> element(1, timer:tc(fun()-> fact:fac(100000) end)).
7651739
4> element(1, timer:tc(fun()-> fact:tail_fac(100000) end)).
7229662
5> element(1, timer:tc(fun()-> fact:tail_fac(100000) end)).
7104056
6> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6491195
7> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6506565
8> element(1, timer:tc(fun()-> fact:tail_fac2(100000) end)).
6519624

Run Code Online (Sandbox Code Playgroud)

如您所见fact:tail_fac2/1，N = 100000耗时6.5s，fact:tail_fac/17.2s和fact:fac/17.6s。甚至更快的增长也不会推翻尾注收益，因此尾注的版本比身体递归的更快，可以清楚地看到，蓄积器的较慢增长fact:tail_fac2/1显示出其影响。

如果您选择其他函数进行尾部呼叫优化测试，则可以更清楚地看到尾部呼叫优化的影响。例如总和：

sum(0) -> 0;
sum(N) when N > 0 -> N + sum(N-1).

tail_sum(N) when is_integer(N), N >= 0 ->
    tail_sum(N, 0).

tail_sum(0, Acc) -> Acc;
tail_sum(N, Acc) -> tail_sum(N-1, N+Acc).

Run Code Online (Sandbox Code Playgroud)

速度是：

1> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
970749
2> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
126288
3> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
113115
4> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
104371
5> element(1, timer:tc(fun()-> fact:sum(10000000) end)).
125857
6> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
92282
7> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
92634
8> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
68047
9> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
87748
10> element(1, timer:tc(fun()-> fact:tail_sum(10000000) end)).
94233

Run Code Online (Sandbox Code Playgroud)

如您所见，在这里我们可以轻松使用N=10000000它，并且运行速度非常快。无论如何，身体递归功能明显比110ms慢了85ms。您会注意到，第一轮的fact:sum/1时间比其他轮次要长9倍。这是因为身体递归函数消耗了堆栈。当您使用尾部递归副本时，您将不会看到这种效果。（尝试一下。）如果在单独的过程中运行每个测量，则可以看到差异。

1> F = fun(G, N) -> spawn(fun() -> {T, _} = timer:tc(fun()-> fact:G(N) end), io:format("~p took ~bus and ~p heap~n", [G, T, element(2, erlang:process_info(self(), heap_size))]) end) end.
#Fun<erl_eval.13.91303403>
2> F(tail_sum, 10000000).
<0.88.0>
tail_sum took 70065us and 987 heap
3> F(tail_sum, 10000000).
<0.90.0>
tail_sum took 65346us and 987 heap
4> F(tail_sum, 10000000).
<0.92.0>
tail_sum took 65628us and 987 heap
5> F(tail_sum, 10000000).
<0.94.0>
tail_sum took 69384us and 987 heap
6> F(tail_sum, 10000000).
<0.96.0>
tail_sum took 68606us and 987 heap
7> F(sum, 10000000).
<0.98.0>
sum took 954783us and 22177879 heap
8> F(sum, 10000000).
<0.100.0>
sum took 931335us and 22177879 heap
9> F(sum, 10000000).
<0.102.0>
sum took 934536us and 22177879 heap
10> F(sum, 10000000).
<0.104.0>
sum took 945380us and 22177879 heap
11> F(sum, 10000000).
<0.106.0>
sum took 921855us and 22177879 heap

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，7 月前
查看次数：	304 次
最近记录：	6 年，6 月前