如果堆栈跟踪的长度是偶数，则 JIT 重新编译以在多次迭代后执行快速抛出

Question

如果堆栈跟踪的长度是偶数，则 JIT 重新编译以在多次迭代后执行快速抛出

下面的代码，

public class TestFastThrow {

    public static void main(String[] args) {
        int count = 0;
        int exceptionStackTraceSize = 0;
        Exception exception = null;
        do {
            try {
                throwsNPE(1);
            }
            catch (Exception e) {
                exception = e;
                if (exception.getStackTrace().length != 0) {
                    exceptionStackTraceSize = exception.getStackTrace().length;
                    count++;
                }
            }
        }
        while (exception.getStackTrace().length != 0);
        System.out.println("Iterations to fastThrow :" + count + ", StackTraceSize :" + exceptionStackTraceSize);
    }

    static void throwsNPE(int callStackLength) {
        throwsNPE(callStackLength, 0);
    }

    static void throwsNPE(int callStackLength, int count) {
        if (count == callStackLength) {
            ((Object) null).getClass();
        }
        else {
            throwsNPE(callStackLength, count + 1);
        }
    }

}

Run Code Online (Sandbox Code Playgroud)

多次运行后给出以下输出，

Iterations to fastThrow :5517, StackTraceSize :4
Iterations to fastThrow :2825, StackTraceSize :5
Iterations to fastThrow :471033, StackTraceSize :6
Iterations to fastThrow :1731, StackTraceSize :7
Iterations to fastThrow :157094, StackTraceSize :10
.
.
.
Iterations to fastThrow :64587, StackTraceSize :20
Iterations to fastThrow :578, StackTraceSize :29

Run Code Online (Sandbox Code Playgroud)

虚拟机详情

Java HotSpot(TM) 64-Bit Server VM (11.0.5+10-LTS) for bsd-amd64 JRE (11.0.5+10-LTS)
-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintAssembly

Run Code Online (Sandbox Code Playgroud)

令人惊讶的是，如果堆栈跟踪的长度是偶数，为什么 JIT 需要更多的迭代来优化？

我启用了 JIT 日志并通过 jitwatch 进行了分析，但看不到任何有用的信息，只是对于偶数大小的堆栈跟踪，C1 和 C2 编译的时间线似乎稍后发生。

时间线是这样的，（看java.lang.Throwable.getStackTrace()编译的时候）

| StackSize     | 10    | 11    |
|---------------|-------|-------|
| Queued for C1 | 1.099 | 1.012 |
| C1            | 1.318 | 1.162 |
| Queued for C2 | 1.446 | 1.192 |
| C2            | 1.495 | 1.325 |

Run Code Online (Sandbox Code Playgroud)

为什么会发生这种情况？JIT 使用什么启发式方法进行快速投掷？

Answer 1

apa*_*gin 5

这种效果是棘手的分层编译和内联策略的结果。

让我解释一下简化的例子：

public class TestFastThrow {

    public static void main(String[] args) {
        for (int iteration = 0; ; iteration++) {
            try {
                throwsNPE(2);
            } catch (Exception e) {
                if (e.getStackTrace().length == 0) {
                    System.out.println("Iterations to fastThrow: " + iteration);
                    break;
                }
            }
        }
    }

    static void throwsNPE(int depth) {
        if (depth <= 1) {
            ((Object) null).getClass();
        }
        throwsNPE(depth - 1);
    }
}

Run Code Online (Sandbox Code Playgroud)

为简单起见，我将从编译中排除所有方法，除了throwsNPE.

-XX:CompileCommand=compileonly,TestFastThrow::throwsNPE -XX:+PrintCompilation

Run Code Online (Sandbox Code Playgroud)

HotSpot 默认使用分层编译。这里throwsNPE首先在第 3 层编译（带有分析的 C1）。在 C1 中进行分析可以稍后通过 C2 重新编译该方法。
OmitStackTraceInFastThrow优化仅适用于 C2 编译代码。因此，C2 越早编译代码 - 在循环完成之前通过的迭代就越少。
C1 编译代码中的分析如何工作：计数器在每次方法调用和每个向后分支上递增（但是，throwsNPE方法中没有向后分支）。当计数器达到某个可配置的阈值时，JVM 编译策略决定是否需要重新编译当前方法。
throwsNPE是递归方法。HotSpot 最多可以内联递归调用-XX:MaxRecursiveInlineLevel（默认值为 1）。
C1 编译代码回调到 JVM 编译策略的频率，对于常规调用和内联调用是不同的。常规方法每 2 ^{10 次}调用 ( -XX:Tier3InvokeNotifyFreqLog=10) 通知 JVM，而内联方法很少通知 JVM：每 2 ^{20 次}调用 ( -XX:Tier23InlineeNotifyFreqLog=20)。
对于偶数次递归调用，所有调用都遵循Tier23InlineeNotifyFreqLog参数。当调用次数为奇数时，内联对最后剩余的调用不起作用，并且最后一次调用跟随Tier3InvokeNotifyFreqLog参数。
这意味着，当调用深度为偶数时，throwsNPE将仅在 2 ^{20 次}调用后重新编译，即在 2 ^{19 次}循环迭代后。这正是您在运行上述代码时将看到的throwNPE(2)：
```
Iterations to fastThrow: 524536
```
Run Code Online (Sandbox Code Playgroud)
524536 非常接近 2 ¹⁹ = 524288

现在，如果您使用运行相同的应用程序-XX:Tier23InlineeNotifyFreqLog=15，迭代次数将接近 2 ¹⁴ = 16384。
```
Iterations to fastThrow: 16612
```
Run Code Online (Sandbox Code Playgroud)
现在让我们更改代码以调用throwsNPE(1). 无论Tier23InlineeNotifyFreqLog价值如何，该程序都将很快完成。那是因为现在不同的选项规则。但是如果我用重新运行程序-XX:Tier3InvokeNotifyFreqLog=20，循环将不早于 2 ²⁰次迭代后完成：
```
Iterations to fastThrow: 1048994
```
Run Code Online (Sandbox Code Playgroud)

概括

快速抛出优化仅适用于 C2 编译的代码。由于一级内联 ( -XX:MaxRecursiveInlineLevel)，C2 编译会更早触发（在 2 次^{Tier3InvokeNotifyFreqLog}调用之后，如果递归调用次数为奇数）或晚（在 2 次^{Tier23InlineeNotifyFreqLog}调用之后，如果所有递归调用都被内联覆盖）。

归档时间：	5 年，10 月前
查看次数：	161 次
最近记录：	5 年，10 月前