Kin*_*ung 2 java crash java-native-interface garbage-collection jvm
我们有一个Java应用程序,它具有一个多线程(pthread)的JNI层,并将根据从底层网络收到的消息回调Java级别.
我们注意到每次崩溃都是由gc引起的.我们甚至可以通过jmap -histo <pid>
在JNI层从网络接收消息时通过调用手动触发gc来模拟这种崩溃.
鉴于我们在本文中的GC期间读到的有关JVM行为的信息,/sf/answers/2758102721/,我们仍然无法弄清楚为什么这样的崩溃与gc相关,因为JNI函数调用在gc期间被阻止.
如果有人能够阐明这一点,那就太好了.提前致谢.
以下是我们在应用程序崩溃后收集的堆栈跟踪.
Program terminated with signal 6, Aborted.
#0 0x0000003cdce325e5 in raise () from /lib64/libc.so.6
#1 0x0000003cdce33dc5 in abort () from /lib64/libc.so.6
#2 0x00007fdafe2516b5 in os::abort(bool) () from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#3 0x00007fdafe3efbf3 in VMError::report_and_die() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#4 0x00007fdafde2f3e2 in report_vm_error(char const*, int, char const*, char const*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#5 0x00007fdafe24c1ff in os::PlatformEvent::park() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#6 0x00007fdafe20c538 in Monitor::ILock(Thread*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#7 0x00007fdafe20c73f in Monitor::lock_without_safepoint_check() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#8 0x00007fdafe2e7a1f in SafepointSynchronize::block(JavaThread*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#9 0x00007fdafe39bcdd in JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#10 0x00007fdafe0123d8 in jni_NewByteArray ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#11 0x00007fdaa447b7d1 in JNIEnv_::NewByteArray (this=0x7fdaf800c9f8, len=7)
at /usr/java/jdk1.8.0_65/include/jni.h:1643
---omitted---
#19 0x0000003cdd20b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#20 0x00007fdafe24c133 in os::PlatformEvent::park() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#21 0x00007fdafe20ce27 in Monitor::IWait(Thread*, long) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#22 0x00007fdafe20d5f0 in Monitor::wait(bool, long, bool) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
---Type <return> to continue, or q <return> to quit---
#23 0x00007fdafe39ed51 in Threads::destroy_vm() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#24 0x00007fdafdfff931 in jni_DestroyJavaVM ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#25 0x00007fdafe91a63d in JavaMain () from /usr/java/jdk1.8.0_65/bin/../lib/amd64/jli/libjli.so
#26 0x0000003cdd207aa1 in start_thread () from /lib64/libpthread.so.0
#27 0x0000003cdcee8aad in clone () from /lib64/libc.so.6
Run Code Online (Sandbox Code Playgroud)
我们获得JNIEnv*的方式,例如
JNIEnv *env = 0;
jint result = jvm->GetEnv((void **) &env, JNI_VERSION_1_8);
if (result != JNI_OK) {
result = jvm->AttachCurrentThread((void **) &env, NULL);
Run Code Online (Sandbox Code Playgroud)
在花了几天调查这个JNI问题之后,我们终于找到了原因,我想在这里分享我们的经验,希望它能帮助别人.
首先,我们首先需要使用JNI的原因是因为我们需要使用作为Linux本机库的第三方网络库,不幸的是,这是我们问题的原因.
该库为我们提供了一个回调句柄,我们实现了它来接收来自它的传入网络消息,我们后来发现,这个回调只是一个信号处理程序.因此,这意味着即使在gc期间,只要信号弹出,就会调用此信号处理程序.
由于C线程在JVM中的安全点期间保持运行,如果那些C线程没有连接到JVM就没问题,否则灾难肯定会发生.
这是我们认为发生过的事情.(以下所有内容都发生在JNI层)
我们看到的gdb堆栈跟踪基本上是当一个gc线程实际上正在堆上做一些工作,然后从我们的应用程序调用一些应用程序工作然后进行一些JNI API调用时发生的事情. .BOOM
解:
ps可能有些细节并不完全准确,因此欢迎任何JVM专家建议.我会尽力按照建议纠正它们.
谢谢
Update.1(@apangin):我们这里有另一个gdb stacktrace.只是想知道#18的GangWorker是否是并行GC线程.
#0 0x00000035b90325e5 in raise () from /lib64/libc.so.6
#1 0x00000035b9033dc5 in abort () from /lib64/libc.so.6
#2 0x00007febd60813b5 in os::abort(bool) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#3 0x00007febd6223673 in VMError::report_and_die() () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#4 0x00007febd60868bf in JVM_handle_linux_signal () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#5 0x00007febd607ce13 in signalHandler(int, siginfo*, void*) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#6 <signal handler called>
#7 0x00007feb9fcf551c in JNIEnv_::NewByteArray (this=0x7febd001d9f8, len=8) at /usr/java/jdk1.8.0_131/include/jni.h:1643
*<omitted app specific calls>*
#13 <signal handler called>
#14 0x00000035b980b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#15 0x00007febd607b7e3 in os::PlatformEvent::park() () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#16 0x00007febd603c037 in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#17 0x00007febd603c956 in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#18 0x00007febd6244d6b in GangWorker::loop() () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#19 0x00007febd6082568 in java_start(Thread*) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#20 0x00000035b9807aa1 in start_thread () from /lib64/libpthread.so.0
#21 0x00000035b90e8aad in clone () from /lib64/libc.so.6
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1983 次 |
最近记录: |