分析死锁时,"FixIn的DebugInfo不会指向关键部分"

Mik*_*lor 5 delphi debugging multithreading windbg critical-section

我正在使用Windbg来分析在用delphi编写的数据快照应用程序服务器中发生的死锁.

我跑的时候

!analyze -hang -v
Run Code Online (Sandbox Code Playgroud)

我明白了

:000:x86> !analyze -hang -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/00000000.htm?Retriage=1

FAULTING_IP: 
+6ced240
00000000 ??              ???

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 0000000000000000
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 0

FAULTING_THREAD:  0000000000000000

BUGCHECK_STR:  HANG

DEFAULT_BUCKET_ID:  APPLICATION_HANG

PROCESS_NAME:  ********.exe

ERROR_CODE: (NTSTATUS) 0xcfffffff - 

EXCEPTION_CODE: (NTSTATUS) 0xcfffffff - 

MOD_LIST: 

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

DERIVED_WAIT_CHAIN:  

Dl Eid Cid     WaitType
-- --- ------- --------------------------
   0   c7c.2634 Critical Section       

WAIT_CHAIN_COMMAND:  ~0s;k;;

BLOCKING_THREAD:  0000000000002634

PRIMARY_PROBLEM_CLASS:  APPLICATION_HANG

LAST_CONTROL_TRANSFER:  from 0000000077138df4 to 000000007711f8b1

STACK_TEXT:  
0018fc50 77138df4 00000c6c 00000000 00000000 ntdll_77100000!NtWaitForSingleObject+0x15
0018fcb4 77138cd8 00000000 00000000 03fe0940 ntdll_77100000!RtlpWaitOnCriticalSection+0x13e
0018fcdc 7369324f 736a3134 00000000 03fe0940 ntdll_77100000!RtlEnterCriticalSection+0x150
WARNING: Stack unwind information not available. Following frames may be wrong.
0018fcec 7369af5f 00000388 00000000 003d1e00 mswsock!GetLspGuid+0x19af
0018fd08 76366958 00000388 0018fd84 0018fd9c mswsock!GetLspGuid+0x96bf
0018fd38 0018fd58 763668cd 00000388 0018fd84 ws2_32!WSAAccept+0x84
00000000 00000000 00000000 00000000 00000000 0x18fd58


FOLLOWUP_IP: 
mswsock!GetLspGuid+19af
7369324f 33db            xor     ebx,ebx

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  mswsock!GetLspGuid+19af

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: C:\Windows\System32\mswsock

IMAGE_NAME:  lld

DEBUG_FLR_IMAGE_TIMESTAMP:  4ce7c83d

STACK_COMMAND:  ~0s ; kb

FAILURE_BUCKET_ID:  APPLICATION_HANG_cfffffff_lld!Unloaded

BUCKET_ID:  X64_HANG_mswsock!GetLspGuid+19af

WATSON_STAGEONE_URL:  http://watson.microsoft.com/00000000.htm?Retriage=1

Followup: MachineOwner
---------

然后我做了

!locks -V
Run Code Online (Sandbox Code Playgroud)

看看它正在等待哪个锁,令我惊讶的是它返回了这个,

0:000:x86> !locks -V

CritSec ntdll!RtlCriticalSectionLock+0 at 0000000077057060
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!LdrpLoaderLock+0 at 0000000077057490
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!RtlpDynamicFunctionTableLock+0 at 0000000077057468
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!FastPebLock+0 at 000000007705a900
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!RtlpProcessHeapsListLock+0 at 000000007705a240
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec +270208 at 0000000000270208
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    1

CritSec ntdll!EtwProvCritSect+0 at 000000007705a120
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!EtwPrivSessionCritSect+0 at 000000007705a1e0
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec +10208 at 0000000000010208
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec +276f40 at 0000000000276f40
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

Scanned 10 critical sections

从查看调用堆栈

STACK_TEXT:  
0018fc50 77138df4 00000c6c 00000000 00000000 ntdll_77100000!NtWaitForSingleObject+0x15
0018fcb4 77138cd8 00000000 00000000 03fe0940 ntdll_77100000!RtlpWaitOnCriticalSection+0x13e
0018fcdc 7369324f 736a3134 00000000 03fe0940 ntdll_77100000!RtlEnterCriticalSection+0x150
WARNING: Stack unwind information not available. Following frames may be wrong.
0018fcec 7369af5f 00000388 00000000 003d1e00 mswsock!GetLspGuid+0x19af
0018fd08 76366958 00000388 0018fd84 0018fd9c mswsock!GetLspGuid+0x96bf
0018fd38 0018fd58 763668cd 00000388 0018fd84 ws2_32!WSAAccept+0x84
00000000 00000000 00000000 00000000 00000000 0x18fd58

我确定它正在等待地址0x736a3134的一个关键部分(第一个参数传递给RtlEnterCriticalSection)所以我运行了这个

!critsec 736a3134
Run Code Online (Sandbox Code Playgroud)

这给了我这个输出

0:000:x86> !critsec 736a3134

DebugInfo for CritSec at 00000000736a3134 does not point back to the critical section
NOT an initialized critical section.

CritSec mswsock!WSPStartup+6f64 at 00000000736a3134
WaiterWoken        Yes
LockCount          -1
RecursionCount     11028
OwningThread       c6c
EntryCount         1f49dad6
ContentionCount    88000000
*** Locked

现在,便士掉线了,指向关键部分的指针已经损坏,可能是由于并发线程访问和代码中其他地方缺乏同步

我的问题是如何追踪这是什么或找出是否是另一个问题?

PS:只有在应用程序负载很重且可能连接了700个客户端时才会出现此错误

(每个连接使用一个线程,我知道32位应用程序将限制为默认线程堆栈大小的aprox 2000线程,这不是最好的方法)

PPS:我有多个故障转储,其中应用程序挂起等待不同的关键部分,在每种情况下,临界部分的指针似乎不指向关键部分.

Mik*_*lor 1

只是为了让您知道我们放弃了试图找出造成这种情况的原因。因为只有当程序接近其最大虚拟内存空间(2.1GB 32 位应用程序)时才会发生这种情况,因为我们使用的是每个连接一个线程的方法。

最后,我们重新设计了客户端,以便它们不再使用此服务器应用程序,而是使用 SOAP 服务器。

SOAP 服务器的扩展性似乎比我们使用的 datasnap/Midas 好得多,尽管我们仍在最初出现问题的客户端站点上对其进行测试。