是什么导致 AppCrash 和 BSOD 事件,普遍不稳定?

mHu*_*ley 5 crash bsod windows-10

解决方案:一直是 RAM 设置:-| 我从来没有想过,带有库存 RAM 的库存板上的库存设置会相差如此之大,以至于会导致系统不稳定。我从来没有做过超频,所以我从来没有仔细研究过这些设置。一旦我选择了与我的 RAM 相匹配的 DOCP 配置文件,一切都会变得清晰,而且速度甚至更快。感谢 Twisty Impersonator 的流程指南和 magicandre1981 的建议,促使我检查设置。希望这会为其他人节省 2 年的挫败感。

EDIT: Well, I think the cause has become clear. After replacing ALL the hardware, and STILL seeing a problem, I decided to go back to the hardware idea. In short: if I run with two sticks of RAM, everything is fine. It doesn't matter which two sticks. If I put in all four, I start having problems. This seems like a pretty clear indication of a bad motherboard.

The Symptoms:

For the last several years my machine has been generally unstable, off and on. Typically manifests as BSODs with varying stop codes.

  • Upgrading the RAM improved the stability for a while.
  • Upgrading the motherboard improved the stability for a while.
  • Replacing the C: drive improved the stability for a while.
  • Refreshing or reinstalling the OS has occasionally been necessary, and usually improves stability for a while.

I have replaced literally every functional component in the system, except the CPU and Blu-ray drive. I have not ruled out the CPU, but there is still a vast swath of software-"things" that might also be at fault.

Each time, the problem has returned after a few months.


Most recently, the symptoms have changed slightly. I am open to the possibility that this is a completely unrelated problem, but it seems too similar to the problems I have been battling the whole time, to be mere coincidence.

A few weeks I rebooted my computer to update, and it would not POST. I fussed with it for a while (checking connections, MemOK! button, disconnect power, TPU on/off, EPU on/off, etc.) and got it to POST, but the OS would not load. I forget the exact presentation of symptoms, but IIRC it would just sit and spin.

Reinstalled the OS and things were quiet for a week or so, until apps began crashing. At first, it seemed like all the apps that were crashing were installed on the same SSD. Without room to move things around and test, I upgraded to the new Samsung drives. But apps are still crashing.

  • Flashed latest BIOS update. No change.
    • Turns out, you have to reset the CMOS when you flash the BIOS. Potential symptoms are much like mine. I reset the CMOS. No change.
  • It was generally high-demand applications that would crash (Dishonored 2, Diablo III, ESO, etc). But crashes are happening between 35°C-45°C for CPU and GPU - So probably not temperature.
  • It is not running out of RAM.
  • MemTest has never shown any problems. I have run it dozens of times.
  • No CPU test has ever shown any issues, except at high temperatures.
  • No GPU test has ever shown any issues, except at high temperatures.
  • I've reinstalled my video drivers a few dozen times.
  • I had Task Manger crash while I was watching yesterday.
  • Tried to install a Windows Store App. Some background process crashed. Had to try again. Worked fine.
  • Event Viewer has just AppCrash events

AppCrash events are being produced by a wide range of applications. Varying sizes, locations, demands, etc. It is typically once a day, maybe less. But high-resource applications crash pretty reliably within 30 minutes or so.

I should clarify that these are not Windows is looking for a solution AppHang events. The application just vanishes, like I closed it, and Windows has nothing to say about it except the AppCrash event in the Event Viewer. Less often, there is a BSOD. Lately, I have seen IRQ not less than or equal, and others that I cannot remember... (I don't have any memory dumps anymore? That's weird...).

System specs:

  • OS: Windows 10 Pro (upgraded from Win7 during free upgrade period)
  • CPU: AMD Phenom II 1090 (no overclocking)
  • Cooling: CoolerMaster 150mm CPU fans, several case fans
  • Mainboard: ASUS M4A99X EVO R2.0
  • RAM: G.Skill 16GB(4x4) DDR3-1333
  • GPU: MSI GTX 970 (no overclocking)
  • PSU: Corsair CX750M
  • System drive: Samsung 850 EVO 500GB
  • Other drives: Samsung 850 EVO 500GB, other conventional drives, optical drive
  • A/V: Windows Defender, no other AV

Crash dump:

Prompted by this post: https://superuser.com/questions/1281659/possible-to-determine-which-core-a-faulting-application-was-on-when-it-crashed

Hit a new BSOD while it was idling last night. Details from WhoCrashed below:

Crash dump directory: C:\WINDOWS\Minidump
Crash dumps are enabled on your computer.

On Wed 1/3/2018 9:00:13 AM GMT your computer crashed
crash dump file: C:\WINDOWS\Minidump\010318-12546-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x1640E0)
Bugcheck code: 0x1E (0xFFFFFFFFC0000005, 0xFFFFF8019CED183E, 0xFFFF968442FBEB68, 0xFFFF968442FBE3B0)
Error: KMODE_EXCEPTION_NOT_HANDLED
file path: C:\WINDOWS\system32\ntoskrnl.exe
product: Microsoft® Windows®
Operating System company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that a kernel-mode program generated an exception
which the error handler did not catch. This appears to be a typical software driver bug
and is not likely to be caused by a hardware problem.  The crash took place in the Windows
kernel. Possibly this problem is caused by another driver that cannot be identified at this time. 

On Wed 1/3/2018 9:00:13 AM GMT your computer crashed
crash dump file: C:\WINDOWS\memory.dmp
This was probably caused by the following module: ntdll.sys (ntdll!ZwFlushBuffersFile+0x14)
Bugcheck code: 0x1E (0xFFFFFFFFC0000005, 0xFFFFF8019CED183E, 0xFFFF968442FBEB68, 0xFFFF968442FBE3B0)
Error: KMODE_EXCEPTION_NOT_HANDLED
Bug check description: This indicates that a kernel-mode program generated an exception
which the error handler did not catch. This appears to be a typical software driver bug
and is not likely to be caused by a hardware problem.  A third party driver was identified
as the probable root cause of this system error. It is suggested you look for an update for
the following driver: ntdll.sys.G
Google query: ntdll.sys KMODE_EXCEPTION_NOT_HANDLED
Run Code Online (Sandbox Code Playgroud)

Memory dumps (full and mini) will be here, as they are available: https://1drv.ms/f/s!AhSzRvnavkrXhPpNy8Qjhaj6LbbTwQ


@magicandre1981 recommended chkdsk /f based on the results of my memory dump. C: is the only drive for which a pagefile is enabled (it's system managed), so that's the one I ran it on. Here are the results:

Checking file system on C: The type of the file system is NTFS.

A disk check has been scheduled.
Windows will now check the disk.                         

Stage 1: Examining basic file system structure ...
  605184 file records processed.                                                         File verification completed.
Deleting orphan file record segment 699DD.
  10717 large file records processed.                                      0 bad file records processed.                                      
Stage 2: Examining file name linkage ...
  14846 reparse records processed.                                         704776 index entries processed.                                                        Index verification completed.
  0 unindexed files scanned.                                           0 unindexed files recovered to lost and found.                       14846 reparse records processed.                                       
Stage 3: Examining security descriptors ...
Cleaning up 1426 unused index entries from index $SII of file 0x9.
Cleaning up 1426 unused index entries from index $SDH of file 0x9.
Cleaning up 1426 unused security descriptors.
Security descriptor verification completed.
  49797 data files processed.                                            CHKDSK is verifying Usn Journal...
  37651904 USN bytes processed.                                                            Usn Journal verification completed.
CHKDSK discovered free space marked as allocated in the
master file table (MFT) bitmap.
CHKDSK discovered free space marked as allocated in the volume bitmap.

Windows has made corrections to the file system.
No further action is required.

 487284001 KB total disk space.
 209659436 KB in 259738 files.
    162276 KB in 49798 indexes.
         0 KB in bad sectors.
    729085 KB in use by the system.
     65536 KB occupied by the log file.
 276733204 KB available on disk.

      4096 bytes in each allocation unit.
 121821000 total allocation units on disk.
  69183301 allocation units available on disk.

Internal Info:
00 3c 09 00 f0 b8 04 00 7e 93 08 00 00 00 00 00  .<......~.......
98 05 00 00 66 34 00 00 00 00 00 00 00 00 00 00  ....f4..........

Windows has finished checking your disk.
Please wait while your computer restarts.
Run Code Online (Sandbox Code Playgroud)

No luck. Even after chkdsk fixed these issues, I'm still having the same crashes, though no new BSODs yet.


Another BSOD as I was opening the browser to update this question. Memdumps available once they finish uploading.

But the original reason I came to update is that I found a whole crapton (51 to be precise) of events that look exactly the same. It looks like they happened about every half-hour, starting right after I left for work (7:30am) until about 8:30pm. They might still be happening. They all look like exactly this:

Fault bucket 0x1E_c0000005_fltmgr!FltpPreFsFilterOperation, type 0
Event Name: BlueScreen
Response: Not available
Cab Id: 0

Problem signature:
P1: 1e
P2: ffffffffc0000005
P3: fffff8019ced183e
P4: ffff968442fbeb68
P5: ffff968442fbe3b0
P6: 10_0_16299
P7: 0_0
P8: 256_1
P9: 
P10: 

Attached files:
\\?\C:\WINDOWS\Minidump\010318-12546-01.dmp
\\?\C:\WINDOWS\TEMP\WER-18531-0.sysdata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER5795.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER57A5.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER57B6.tmp.txt
\\?\C:\Windows\Temp\WER8F12.tmp.WERDataCollectionStatus.txt

These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\Kernel_1e_b49232881f44bde28acca17f0ad8bac3b4fbb67_00000000_cab_031c57c4

Analysis symbol: 
Rechecking for solution: 0
Report Id: 3c2abe43-d7d6-4561-9b0d-2adf1f40c745
Report Status: 388
Hashed bucket: 
Run Code Online (Sandbox Code Playgroud)

I have a hard time believing that the CPU would have this issue for so long, and the computer still be functional. I haven't had much success exploring software/configuration issues.

Any ideas?


Almost 3 weeks later.... After MUCH shenanigans, I finally acquire a new CPU (upgraded from Phenom II to FX-8350). Replacement was easy enough. Then probe common problem-areas, and apps are still crashing.

一旦我发布了“悲伤的脸”,Windows 就会告诉我一些关于“设备健康报告”的信息。它报告了一个驱动程序的问题。不幸的是,但不出所料,故障排除程序无法检测到任何类型的问题。我从设备管理器中卸载了两个处于错误状态的“USB Root Hub”设备。

它与池押韵

这是否提供了任何额外的线索?我真的很茫然,现在...


这是驱动程序信息列表...? https://docs.google.com/spreadsheets/d/1xAliAOt1s8rQ_ePX5OwTRVFPB3kFYgc3-1HRUznMpR0/edit?usp=sharing

I s*_*ica 2

分而治之

首先,您必须尝试确定这是硬件还是软件问题。有时两者都涉及,但最初最好假设不是。

根据我的经验,确定哪个阵营有问题的最有效方法是启动到第二个完全不同的操作系统(请注意,无需更改任何硬件)并尝试重现问题。最好使用不使用与可疑操作系统相同代码的操作系统。例如,如果您的可疑系统运行 Windows,您可以使用 Ubuntu 作为测试操作系统。Live CD 对此很有帮助。

对于间歇性出现的问题,这可能具有挑战性,但无论您如何解决,您都需要知道是否:

  • 两个操作系统都受到影响,这意味着您有硬件问题,或者
  • 只有您的可疑操作系统受到影响,这意味着您可能有:

    • 软件问题,或者
    • 硬件组件和特定软件(几乎总是第三方驱动程序)之间不兼容。

如果你认为这是硬件

您已经测试并更换了很多组件。如果您的测试操作系统中出现了不良行为,那么您就掌握了确凿的证据,表明您尚未替换的某些东西有问题。对于那些不适合进行全面测试的组件(例如主板),您可能会首先尝试更换其他成本较低的组件,但最终您可能别无选择,只能更换更昂贵的组件。

如果你认为它是软件

如果测试操作系统没有触发故障,您就可以更加确信目标操作系统中的软件存在问题。但是,如果故障历来无法按需生成或仅间歇性地发生,则仍然有可能是测试操作系统中未触发的硬件问题。不要纠缠于此;在测试您的暂定解决方案时请记住这一点。

在找出有问题的代码时,您显然希望跟踪特定的错误消息,例如 Windows 的错误检查代码、事件日志或特定于应用程序的日志中记录的错误。基于您已经用尽这些线索并且需要更通用的方法的假设,我将跳过这些步骤。

当不清楚哪个软件出了问题时,您可以选择的武器是从等式中删除该软件,并运行系统足够长的时间,以便问题有机会发生(如果确实如此)。您可以通过以下方式执行此操作:

  1. 卸载软件。
  2. 使用 Microsoft AutoRuns 等工具禁用它。
  3. 通过启动进入安全模式来禁用它。
  4. 创建不包含相关软件的第二个 Windows 安装(如果您确实需要该软件进行日常使用并且希望能够在“测试”和“生产”模式之间轻松切换,则非常有用)。

执行此操作时,我喜欢对系统软件进行如下分类并相应地进行故障排除:

  1. Windows 自己的代码和内置驱动程序。出错的可能性最小。通过使用原始安装(没有任何第三方代码的安装)测试系统可以轻松确认。
  2. 第三方驱动程序。总是惹麻烦。通常会以非随机方式崩溃,从而出现某种模式。通过使用不同的驱动程序版本或更换硬件组件进行测试。
  3. 第三方系统级软件(例如安全软件)。麻烦。这些对于正常的系统操作很少需要,可以完全卸载以测试它们的影响。
  4. 用户应用程序。高度可变的碰撞行为。在现代版本的 Windows 上,这些很少会导致整个系统崩溃或锁定。故障仅在应用程序运行时发生,因此可以轻松跟踪故障并将其与当时正在运行的程序关联起来。请注意具有始终在线组件(例如启动项或系统服务)的用户应用程序。

保留半详细的工作日志

最后的想法在这里。记录您遇到的问题以及您采取的故障排除步骤。对于像这样的困难且旷日持久的问题,很容易忘记细节。能够在工作时回顾这一点可能会帮助您排除原因或在事实之间建立联系,否则这些事实可能会在斗争中丢失。


轶事故事

我开发了一个可以提醒我你的情况的系统。这是一台会随机锁定的笔记本电脑(这限制了我的硬件交换选项)。它会在通电后 10 秒内执行此操作,然后几天内不会执行此操作,然后在打开几个小时后执行此操作。我更新了所有内容,测试并更换了所有可能的硬件组件,并重新安装了 Windows(至少一次,如果不是两次)。

它最终成为主板。更换后,笔记本电脑运行了很多年,没有再出现任何问题。


归档时间:

查看次数:

1724 次

最近记录:

6 年,1 月 前