有人可以指出我的方向,这可能会让我明白为什么 JIT 会取消优化我的循环?(OSR)。看起来它被 C1 编译一次,然后多次取消优化(我可以看到数十或数百个以 <deoptimized...> 开头的日志)
这是包含该重要循环的类:
@SynchronizationRequired
public class Worker implements Runnable
{
private static final byte NOT_RUNNING = 0, RUNNING = 1, SHUTDOWN = 2, FORCE_SHUTDOWN = 3;
private static final AtomicIntegerFieldUpdater<Worker> isRunningFieldUpdater =
AtomicIntegerFieldUpdater.newUpdater(Worker.class, "isRunning");
private volatile int isRunning = NOT_RUNNING;
private final Queue<FunkovConnection> tasks = new SpscUnboundedArrayQueue<>(512);
/**
* Executing tasks from queue until closed.
*/
@Override
public void run()
{
if (isRunning())
{
return;
}
while (notClosed())
{
FunkovConnection connection = tasks.poll();
if (null != connection)
{
connection.run();
}
}
if (forceShutdown())
{
setNonRunning();
return;
}
FunkovConnection connection;
while ((connection = tasks.poll()) != null)
{
connection.run();
}
setNonRunning();
}
public void submit(FunkovConnection connection)
{
tasks.add(connection);
}
/**
* Shutdowns worker after it finish processing all pending tasks on its queue
*/
public void shutdown()
{
isRunningFieldUpdater.compareAndSet(this, RUNNING, SHUTDOWN);
}
/**
* Shutdowns worker after it finish currently processing task. Pending tasks on queue are not handled
*/
public void shutdownForce()
{
isRunningFieldUpdater.compareAndSet(this, RUNNING, FORCE_SHUTDOWN);
}
private void setNonRunning()
{
isRunningFieldUpdater.set(this, NOT_RUNNING);
}
private boolean forceShutdown()
{
return isRunningFieldUpdater.get(this) == FORCE_SHUTDOWN;
}
private boolean isRunning()
{
return isRunningFieldUpdater.getAndSet(this, RUNNING) == RUNNING;
}
public boolean notClosed()
{
return isRunningFieldUpdater.get(this) == RUNNING;
}
}
Run Code Online (Sandbox Code Playgroud)
JIT 日志:
1. <task_queued compile_id='535' compile_kind='osr' method='Worker run ()V' bytes='81' count='1' backedge_count='60416' iicount='1' osr_bci='8' level='3' stamp='0,145' comment='tiered' hot_count='60416'/>
2. <nmethod compile_id='535' compile_kind='osr' compiler='c1' level='3' entry='0x00007fabf5514ee0' size='5592' address='0x00007fabf5514c10' relocation_offset='344' insts_offset='720' stub_offset='4432' scopes_data_offset='4704' scopes_pcs_offset='5040' dependencies_offset='5552' nul_chk_table_offset='5560' oops_offset='4624' metadata_offset='4640' method='Worker run ()V' bytes='81' count='1' backedge_count='65742' iicount='1' stamp='0,146'/>
3. <deoptimized thread='132773' reason='constraint' pc='0x00007fabf5515c24' compile_id='535' compile_kind='osr' compiler='c1' level='3'>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='68801' iicount='1'/>
</deoptimized>
4. <deoptimized thread='132773' reason='constraint' pc='0x00007fabf5515c24' compile_id='535' compile_kind='osr' compiler='c1' level='3'>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='76993' iicount='1'/>
</deoptimized>
5.<deoptimized thread='132773' reason='constraint' pc='0x00007fabf5515c24' compile_id='535' compile_kind='osr' compiler='c1' level='3'>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='85185' iicount='1'/>
</deoptimized>
6. <deoptimized thread='132773' reason='constraint' pc='0x00007fabf5515c24' compile_id='535' compile_kind='osr' compiler='c1' level='3'>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='93377' iicount='1'/>
</deoptimized>
Run Code Online (Sandbox Code Playgroud)
这里有两个问题:
我很高兴@aran 的建议对您的情况有所帮助,但是,这只是一个幸运的巧合。毕竟,JIT 内联选项会影响很多事情,包括编译顺序、时间等等。事实上,反优化与内联无关。
我能够重现您的问题,这是我的分析。
我们在HotSpot 源中看到<deoptimized>消息是按Deoptimization::deoptimize_single_frame函数打印的。让我们使用async-profiler来查找调用此函数的位置。为此,请添加以下 JVM 选项:
-agentlib:asyncProfiler=start,event=Deoptimization::deoptimize_single_frame,file=deopt.html
Run Code Online (Sandbox Code Playgroud)
这是输出的相关部分:
所以,去优化的原因是Runtime1::counter_overflow函数。由 C1 在第 3 层编译的方法,计算调用和反向分支(循环迭代)。每 2 Tier3BackedgeNotifyFreqLog迭代一次方法调用Runtime1::counter_overflow以确定是否应在更高层重新编译它。
在您的日志中,我们看到backedge_count增量正好是 8192 (2 13 ),并且索引 37 处的字节码goto对应于while (notClosed())循环。
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='76993' iicount='1'/>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='85185' iicount='1'/>
<jvms bci='37' method='Worker run ()V' bytes='81' count='1' backedge_count='93377' iicount='1'/>
Run Code Online (Sandbox Code Playgroud)
当计数器溢出(每 8192 次迭代)时,JVM 会检查给定字节码索引的 OSR 编译方法是否已准备好(它可能尚未准备好,因为 JIT 编译在后台运行)。但是如果 JVM 发现这种方法,它会通过对当前帧进行去优化并用相应的 OSR 方法替换它来执行 OSR 转换。
事实证明,在您的示例中,JVM 找到了在第 3 层编译的现有 OSR 方法。基本上,它会取消优化Worker.run在第 3 层编译的帧,并用完全相同的方法替换它!这一次又一次地重复,直到 C2 完成其后台工作。然后Worker.run换成tier 4编译,一切就都好了。
当然,这通常不应该发生。这实际上是一个 JVM 错误JDK-8253118。它已在 JDK 16 中修复,并且可能会向后移植到 JDK 11u。我已经验证过 JDK 16 Early-Access 版本不会发生过度的去优化。
| 归档时间: |
|
| 查看次数: |
209 次 |
| 最近记录: |