当 joda-time 达到闰日时,jvm 去优化会占用 CPU

ZHA*_* Bo 6 jit jvm jodatime

2020 年 2 月 28 日之后,我们发现用于处理日志(具有日期字段)的 Elasticsearch 集群性能不佳。

我们一直使用 HotSpot(TM) 64 位服务器 VM 18.9(构建 11.0.3+12-LTS)来运行 Elasticsearch 集群。

jstack & JVM 去​​优化日志导致我们对 joda-time 的闰年检查进行去优化(reason=unstable_if):

堆栈跟踪

jstack 显示 elasticsearch 的 [bulk] 线程可运行,但实际上它们被阻塞(根据 gstack)因为 BasicGJChronology.java:185 和 BasicGJChronology.java:187 处的 JVM 去​​优化?这两行都是闰年检查。

jstack:

"elasticsearch[xxx][bulk][T#19]" #312 daemon prio=5 os_prio=0 cpu=1972224606.34ms elapsed=12348671.96s tid=0x00007f90a0006800 nid=0xdd41 runnable  [0x00007f8f170ee000]
   java.lang.Thread.State: RUNNABLE
    at org.joda.time.chrono.BasicGJChronology.setYear(BasicGJChronology.java:185)
    at org.joda.time.chrono.BasicYearDateTimeField.setExtended(BasicYearDateTimeField.java:92)
    at org.joda.time.format.DateTimeParserBucket$SavedField.set(DateTimeParserBucket.java:568)
    at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:451)
    at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:182)
    at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
    at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.parse(DateFieldMapper.java:248)
    at org.elasticsearch.index.mapper.DateFieldMapper.parseCreateField(DateFieldMapper.java:456)
    at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297)
--
"elasticsearch[xxx][bulk][T#20]" #315 daemon prio=5 os_prio=0 cpu=1975796442.21ms elapsed=12348662.71s tid=0x00007f92f4048000 nid=0xdd83 runnable  [0x00007f8f16cea000]
   java.lang.Thread.State: RUNNABLE
    at org.joda.time.chrono.BasicGJChronology.setYear(BasicGJChronology.java:187)
    at org.joda.time.chrono.BasicYearDateTimeField.setExtended(BasicYearDateTimeField.java:92)
    at org.joda.time.format.DateTimeParserBucket$SavedField.set(DateTimeParserBucket.java:568)
    at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:451)
    at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:182)
    at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
    at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.parse(DateFieldMapper.java:248)
    at org.elasticsearch.index.mapper.DateFieldMapper.parseCreateField(DateFieldMapper.java:456)
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297)
Run Code Online (Sandbox Code Playgroud)

上面的批量线程的gstack:

#0  0x00007f3cf8f566d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3cf7ef8ef3 in os::PlatformEvent::park() () from /usr/java/jdk-11.0.3/lib/server/libjvm.so
#2  0x00007f3cf7e9feb8 in Monitor::lock_without_safepoint_check() () from /usr/java/jdk-11.0.3/lib/server/libjvm.so
#3  0x00007f3cf79977f3 in Events::log(Thread*, char const*, ...) () from /usr/java/jdk-11.0.3/lib/server/libjvm.so
#4  0x00007f3cf799411a in Deoptimization::uncommon_trap_inner(JavaThread*, int) () from /usr/java/jdk-11.0.3/lib/server/libjvm.so
#5  0x00007f3cf79973d2 in Deoptimization::uncommon_trap(JavaThread*, int, int) () from /usr/java/jdk-11.0.3/lib/server/libjvm.so
Run Code Online (Sandbox Code Playgroud)

来自 BasicGJChronology.java 的代码片段:

"elasticsearch[xxx][bulk][T#19]" #312 daemon prio=5 os_prio=0 cpu=1972224606.34ms elapsed=12348671.96s tid=0x00007f90a0006800 nid=0xdd41 runnable  [0x00007f8f170ee000]
   java.lang.Thread.State: RUNNABLE
    at org.joda.time.chrono.BasicGJChronology.setYear(BasicGJChronology.java:185)
    at org.joda.time.chrono.BasicYearDateTimeField.setExtended(BasicYearDateTimeField.java:92)
    at org.joda.time.format.DateTimeParserBucket$SavedField.set(DateTimeParserBucket.java:568)
    at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:451)
    at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:182)
    at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
    at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.parse(DateFieldMapper.java:248)
    at org.elasticsearch.index.mapper.DateFieldMapper.parseCreateField(DateFieldMapper.java:456)
    at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297)
--
"elasticsearch[xxx][bulk][T#20]" #315 daemon prio=5 os_prio=0 cpu=1975796442.21ms elapsed=12348662.71s tid=0x00007f92f4048000 nid=0xdd83 runnable  [0x00007f8f16cea000]
   java.lang.Thread.State: RUNNABLE
    at org.joda.time.chrono.BasicGJChronology.setYear(BasicGJChronology.java:187)
    at org.joda.time.chrono.BasicYearDateTimeField.setExtended(BasicYearDateTimeField.java:92)
    at org.joda.time.format.DateTimeParserBucket$SavedField.set(DateTimeParserBucket.java:568)
    at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:451)
    at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:182)
    at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
    at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.parse(DateFieldMapper.java:248)
    at org.elasticsearch.index.mapper.DateFieldMapper.parseCreateField(DateFieldMapper.java:456)
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297)
Run Code Online (Sandbox Code Playgroud)

火焰图

从图中的左侧峰值,您可能会发现去优化。

在此处输入图片说明

hs_err_pidxxx.log 中的去优化日志

Deoptimization events (10 events):
Event: 12413938.027 Thread 0x00007f6174013000 Uncommon trap: reason=unstable_if action=none pc=0x00007f65f543ff88 method=org.joda.time.chrono.BasicGJChronology.setYear(JI)J @ 36 c2
Event: 12413938.027 Thread 0x00007f6174013000 Uncommon trap: reason=unstable_if action=none pc=0x00007f65f543ff88 method=org.joda.time.chrono.BasicGJChronology.setYear(JI)J @ 36 c2
Run Code Online (Sandbox Code Playgroud)

问题

  1. 我们如何减轻去优化对性能的影响?似乎影响最大的不是重新JIT编译的成本,而是日志记录的pthread_cond_wait?

  2. 这些集群已经运行了一年多。滚动重启会解决unstable_if吗?

  3. 为什么是一个unstable_if?抱歉,我对 HotSpot 基于配置文件的性能策略一无所知。但我的直觉是,闰日后情况应该很稳定(2020 年 3 月 5 日仍然不稳定)。12/31/2020 也让我担心,因为我不知道unstable_if.

ZHA*_* Bo 1

有人建议在另一个论坛中使用 -XX:PerMethodRecompilationCutoff=10000 -XX:PerBytecodeRecompilationCutoff=10000 。

他认为 Action_none (可能是经过一些重新解释之后)而不是 Action_reinterpret 才是问题所在。

他还提到这个错误可能相关,但我还没有调查:https://bugs.java.com/bugdatabase/view_bug.do ?bug_id=8227523

抱歉,我没有粘贴原始答案,因为它不是英文的。