Iri*_*ium 18 .net c# jit garbage-collection
由于.NET运行时出现内部错误(退出代码0x80131506),我一直在追逐.NET服务中间歇性崩溃的原因.有问题的服务不执行任何通常应归咎于此类错误的操作(不安全代码,PInvoke等).我已尝试禁用并发GC,如KB2679415中所述,以及切换到服务器GC,但间歇性崩溃仍然存在.在调试模式下编译时,问题在.NET 4.7.2及更早版本中出现.
该服务广泛使用旧版本的NHibernate(2.0.1),当我在调试器中检查崩溃转储时,在发生错误时,callstack中总会有NHibernate代码,尽管NHibernate本身都是托管代码,所以不应该导致这种崩溃.
我已设法在调试器下重现崩溃并启用GC压力日志和堆验证,虽然它似乎指向JIT/GC中的问题,但我不确定我是否正确解释输出.
查看发生崩溃的线程,此时它发生在clr!JIT_Stelem_Ref
:
clr!JIT_Stelem_Ref+0x18: cmp r9,qword ptr [r8] ds:aaaaaaaa`aaaaaaaa=????????????????
Run Code Online (Sandbox Code Playgroud)
在这种情况下,0xaa
s 字符串似乎是启用HeapVerify的结果,这会导致GC填充收集的内存区域,可能是为了更容易识别,并建议我们仍然有一个对收集/重定位对象的旧位置的引用.
追溯到堆栈中,有很多0xaaaaaaaaaaaaaaaa
条目,但是当最近的GC发生时,这些停止出现在调用堆栈顶部的方法中,在这种情况下,NHibernate.Loader.Loader.GetRow()
根据最近的GC压力日志此线程上的GC:
(注意:我已经从SOS' !dumplog
输出中反转了记录行的顺序,以便于阅读):
2404 12445.672380360 : `GC`GCROOTS` Starting scan of Thread 000000001EF4DED0 ID = 20 {
2404 12445.672380963 : `GCROOTS` Scanning ExplicitFrame 000000001E6ED3B8 AssocMethod = 0000000000000000 frameVTable = 000007FEF365B640 (clr!RedirectedThreadFrame::`vftable')
2404 12445.672386397 : `GCROOTS` Scanning Frameless method 000007FE93F43460 (NHibernate.Loader.Loader.GetRow(System.Data.IDataReader, NHibernate.Persister.Entity.ILoadable[], NHibernate.Engine.EntityKey[], System.Object, NHibernate.Engine.EntityKey, NHibernate.LockMode[], System.Collections.IList, NHibernate.Engine.ISessionImplementor)) ControlPC = 000007FE945E3095
2404 12445.672388208 : `GC`GCROOTS` GC Root 000000001E6ED4C0 RELOCATED 000000003B1A7708 -> 000000003AC89F08 MT = 000007FE93DDF5C8 (...)
2404 12445.672388510 : `GC`GCROOTS` GC Root 000000001E6ED4D8 RELOCATED 000000003B1A73A0 -> 000000003AC89D00 MT = 000007FEF1FD6EA8 (System.Object[])
2404 12445.672388510 : `GC`GCROOTS` GC Root 000000001E6ED4E8 RELOCATED 000000003B1A7358 -> 000000003AC89CB8 MT = 000007FE9491D7C8 (NHibernate.Engine.EntityKey)
2404 12445.672388510 : `GC`GCROOTS` GC Root 000000001E6ED4F8 RELOCATED 000000003B1A73A0 -> 000000003AC89D00 MT = 000007FEF1FD6EA8 (System.Object[])
Run Code Online (Sandbox Code Playgroud)
此方法的堆栈区域如下:
00000000`1e6ed470 000000003b1a7358 ?
00000000`1e6ed478 000000000291e3d0
00000000`1e6ed480 0000000000000000
00000000`1e6ed488 0000000000000000
00000000`1e6ed490 000000000662a900
00000000`1e6ed498 0000000006523c80
00000000`1e6ed4a0 0000000000000000
00000000`1e6ed4a8 0000000000000000
00000000`1e6ed4b0 0000000000000000
00000000`1e6ed4b8 0000000000000000
00000000`1e6ed4c0 000000003ac89f08 ?
00000000`1e6ed4c8 0000000000000000
00000000`1e6ed4d0 0000000006524248
00000000`1e6ed4d8 000000003ac89d00 ?
00000000`1e6ed4e0 0000000000000000
00000000`1e6ed4e8 000000003ac89cb8 ?
00000000`1e6ed4f0 0000000000000000
00000000`1e6ed4f8 000000003ac89d00 ?
00000000`1e6ed500 0000000100000000
00000000`1e6ed508 0000000c0000000b
00000000`1e6ed510 0000000006621660
00000000`1e6ed518 000000001e6ed690
00000000`1e6ed520 000000001e6ed6a0
Run Code Online (Sandbox Code Playgroud)
我已指示为搬迁的GC压力日志中提到的4项,这与他们的新地址正确的更新,但是第一个堆栈条目(000000003b1a7358
-一NHibernate.Engine.EntityKey
),而这是已经被拆迁对象的一个不被更新,新的地址.当然,这将是完全正常的,如果这不再将被使用,但它实际上是关于作为一个参数的传递调用到NHibernate.Loader.Loader.InstanceNotYetLoaded()
.
InstanceNotYetLoaded()
需要9个参数(加号this
),并且我已经在下面的汇编列表中标记了将每个参数加载到堆栈/寄存器中的位置.我还包括了SOS'的相关输出,!gcinfo
因为它与堆栈中的每个参数有关:
Param Address Instruction GC Info
000007fe`945e3071 mov r9,qword ptr [rbp-38h]
P4> 000007fe`945e3075 mov qword ptr [rsp+20h],r9
000007fe`945e307a mov r9d,dword ptr [rbp-18h] +sp+20
000007fe`945e307e mov rcx,qword ptr [rbp+40h]
000007fe`945e3082 cmp r9,qword ptr [rcx+8]
000007fe`945e3086 jb 000007fe`945e308d
000007fe`945e3088 call clr!JIT_RngChkFail
000007fe`945e308d lea rcx,[rcx+r9*8+10h] -sp+20
000007fe`945e3092 mov r9,qword ptr [rcx]
-- GC Occurred Here --
P5> 000007fe`945e3095 mov qword ptr [rsp+28h],r9
000007fe`945e309a mov r9,qword ptr [rbp+38h] +sp+28
P6> 000007fe`945e309e mov qword ptr [rsp+30h],r9
000007fe`945e30a3 mov r9,qword ptr [rbp+30h] +sp+30
P7> 000007fe`945e30a7 mov qword ptr [rsp+38h],r9
000007fe`945e30ac mov r9,qword ptr [rbp+48h] +sp+38
P8> 000007fe`945e30b0 mov qword ptr [rsp+40h],r9
000007fe`945e30b5 mov r9,qword ptr [rbp+50h] +sp+40
P9> 000007fe`945e30b9 mov qword ptr [rsp+48h],r9
000007fe`945e30be mov r9d,dword ptr [rbp-18h] +sp+48
000007fe`945e30c2 mov rcx,qword ptr [rbp+20h]
000007fe`945e30c6 cmp r9,qword ptr [rcx+8]
000007fe`945e30ca jb 000007fe`945e30d1
000007fe`945e30cc call clr!JIT_RngChkFail
000007fe`945e30d1 lea rcx,[rcx+r9*8+10h] -sp+48 -sp+40 -sp+38 -sp+30 -sp+28
P3> 000007fe`945e30d6 mov r9,qword ptr [rcx]
this> 000007fe`945e30d9 mov rcx,qword ptr [rbp+10h]
P1> 000007fe`945e30dd mov rdx,qword ptr [rbp+18h]
P2> 000007fe`945e30e1 mov r8d,dword ptr [rbp-18h]
000007fe`945e30e5 call InstanceNotYetLoaded(...)
Run Code Online (Sandbox Code Playgroud)
发生崩溃之前的GC发生在000007fe945e3095
,即参数4加载到堆栈(at 000007fe945e3075
)之后,但是000007fe945e308d
根据GC Info,此堆栈条目已经死亡(at ),这可以解释为什么GC重新定位阶段没有更新此参考.
它看起来像参数5-9的GC Info也错误地将它们标记为过早死亡,并且可能在两种情况下都表明它们在看起来像数组索引范围检查之后被立即标记为死.
这对我看起来像一个JIT错误,这些堆栈参数的生命周期被错误地跟踪.这种分析是否正确,如果是这样,最好报告是否.如果它不是JIT错误,我错过了什么可以解释纯托管代码上的这些意外崩溃?
我相信以下代码片段将重现此问题,至少就调试模式中生成错误的GC信息而言.
public void Repro(int p1, object p2, object p3, object p4, object[] p5)
{
// Incorrect GC Info generated for this call
ReproHelper(p1, p2, p3, p4, p5[p1]);
}
public void ReproHelper(int p1, object p2, object p3, object p4, object p5)
{
Console.WriteLine(p1);
Console.WriteLine(p2);
Console.WriteLine(p3);
Console.WriteLine(p4);
Console.WriteLine(p5);
}
Run Code Online (Sandbox Code Playgroud)
本质上,必须有一个方法调用方法:
满足这些条件时,第4 个参数将加载到堆栈中以进行调用,并且堆栈条目被正确标记为包含引用.但是,在确定参数5的值时,将进行数组索引范围检查,在此之后,参数4的堆栈条目将标记为dead.
如果出现GC 后的范围内检查,但之前的实际调用发生,以及对象的GC结果,目前正在为参数4传递被搬迁,当方法恢复,呼叫将通过旧的(无效)地址参数4,而不是新的.
虽然它不能解决问题,但我将其视为是错误,因此应将其视为.NET团队应予以解决。
在.NET Framework 4.7.1(clrjit.dll版本4.7.2xxx)上运行代码段时,会生成正确的GCInfo(+sp+20
实际上仅在ReproHelper
调用之前写入了):
00007ffb`99450630 55 push rbp
00007ffb`99450631 4883ec40 sub rsp,40h
00000003 is a safepoint:
00007ffb`99450635 488d6c2440 lea rbp,[rsp+40h]
00007ffb`9945063a 33c0 xor eax,eax
00007ffb`9945063c 488945f8 mov qword ptr [rbp-8],rax
00007ffb`99450640 48894d10 mov qword ptr [rbp+10h],rcx
00007ffb`99450644 895518 mov dword ptr [rbp+18h],edx
00007ffb`99450647 4c894520 mov qword ptr [rbp+20h],r8
00007ffb`9945064b 4c894d28 mov qword ptr [rbp+28h],r9
interruptible
+rbp+28 +rbp+20 +rbp+10 +rbp-8
00007ffb`9945064f 833d3a3fefff00 cmp dword ptr [00007ffb`99344590],0
00007ffb`99450656 7405 je 00007ffb`9945065d
00007ffb`99450658 e863eaab5f call clr!JIT_DbgIsJustMyCode (00007ffb`f8f0f0c0)
00007ffb`9945065d 90 nop
00007ffb`9945065e 8b5518 mov edx,dword ptr [rbp+18h]
00007ffb`99450661 4c8b4538 mov r8,qword ptr [rbp+38h]
+r8
00007ffb`99450665 413b5008 cmp edx,dword ptr [r8+8]
00007ffb`99450669 7205 jb 00007ffb`99450670
-rbp-8
00007ffb`9945066b e8f015ac5f call clr!JIT_RngChkFail (00007ffb`f8f11c60)
-r8
00007ffb`99450670 488b5538 mov rdx,qword ptr [rbp+38h]
+rdx
00007ffb`99450674 448b4518 mov r8d,dword ptr [rbp+18h]
00007ffb`99450678 4d63c0 movsxd r8,r8d
00007ffb`9945067b 4a8b54c210 mov rdx,qword ptr [rdx+r8*8+10h]
00007ffb`99450680 488955f8 mov qword ptr [rbp-8],rdx
+rbp-8
00007ffb`99450684 488b55f8 mov rdx,qword ptr [rbp-8]
00007ffb`99450688 4889542428 mov qword ptr [rsp+28h],rdx
+sp+28
00007ffb`9945068d 8b5518 mov edx,dword ptr [rbp+18h]
-rdx
00007ffb`99450690 4c8b4520 mov r8,qword ptr [rbp+20h]
+r8
00007ffb`99450694 4c8b4d28 mov r9,qword ptr [rbp+28h]
+r9
00007ffb`99450698 488b4d30 mov rcx,qword ptr [rbp+30h]
+rcx
00007ffb`9945069c 48894c2420 mov qword ptr [rsp+20h],rcx
+sp+20
00007ffb`994506a1 488b4d10 mov rcx,qword ptr [rbp+10h]
-rbp-8
Run Code Online (Sandbox Code Playgroud)
但是,升级到.NET Framework 4.7.2(clrjit.dll版本4.7.3062)后,它不再正确(+sp+20
在对数组索引范围进行检查之前写入,已正确设置,但随后突然取消设置,而仍在ReproHelper
呼叫):
00007ffe`62290630 55 push rbp
00007ffe`62290631 4883ec30 sub rsp,30h
00007ffe`62290635 488d6c2430 lea rbp,[rsp+30h]
00000007 is a safepoint:
00007ffe`6229063a 48894d10 mov qword ptr [rbp+10h],rcx
00007ffe`6229063e 895518 mov dword ptr [rbp+18h],edx
00007ffe`62290641 4c894520 mov qword ptr [rbp+20h],r8
00007ffe`62290645 4c894d28 mov qword ptr [rbp+28h],r9
interruptible
+rbp+28 +rbp+20 +rbp+10
00007ffe`62290649 833d483fefff00 cmp dword ptr [00007ffe`62184598],0
00007ffe`62290650 7405 je 00007ffe`62290657
00007ffe`62290652 e869f7aa5f call clr!TranslateSecurityAttributes+0x857b0 (00007ffe`c1d3fdc0) (JitHelp: CORINFO_HELP_DBG_IS_JUST_MY_CODE)
00007ffe`62290657 90 nop
00007ffe`62290658 488b4d30 mov rcx,qword ptr [rbp+30h]
+rcx
00007ffe`6229065c 48894c2420 mov qword ptr [rsp+20h],rcx
+sp+20
00007ffe`62290661 8b4d18 mov ecx,dword ptr [rbp+18h]
-rcx
00007ffe`62290664 488b5538 mov rdx,qword ptr [rbp+38h]
+rdx
00007ffe`62290668 483b4a08 cmp rcx,qword ptr [rdx+8]
00007ffe`6229066c 7205 jb 00007ffe`62290673
00007ffe`6229066e e8ed22ab5f call clr!TranslateSecurityAttributes+0x88350 (00007ffe`c1d42960) (JitHelp: CORINFO_HELP_RNGCHKFAIL)
-sp+20
00007ffe`62290673 488d54ca10 lea rdx,[rdx+rcx*8+10h]
-rdx +rdx(interior)
00007ffe`62290678 488b0a mov rcx,qword ptr [rdx]
+rcx
00007ffe`6229067b 48894c2428 mov qword ptr [rsp+28h],rcx
+sp+28
00007ffe`62290680 488b4d10 mov rcx,qword ptr [rbp+10h]
00007ffe`62290684 8b5518 mov edx,dword ptr [rbp+18h]
-rdx(interior)
00007ffe`62290687 4c8b4520 mov r8,qword ptr [rbp+20h]
+r8
00007ffe`6229068b 4c8b4d28 mov r9,qword ptr [rbp+28h]
+r9
00007ffe`6229068f e804faffff call 00007ffe`62290098 (GCInfoBug.Bug.ReproHelper(Int32, System.Object, System.Object, System.Object, System.Object), mdToken: 0000000006000004)
-sp+28 -r9 -r8 -rcx
00007ffe`62290694 90 nop
00007ffe`62290695 90 nop
not interruptible
-rbp+28 -rbp+20 -rbp+10
00007ffe`62290696 488d6500 lea rsp,[rbp]
00007ffe`6229069a 5d pop rbp
00007ffe`6229069b c3 ret
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
394 次 |
最近记录: |