当我运行我的tensorflow应用程序时,它只输出"已杀死".我该如何调试?
root@8e4a3a65184e:~/tensorflow# python sample_cnn.py
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_tf_random_seed': 1, '_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_steps': None, '_model_dir': 'data/convnet_model', '_save_summary_steps': 100}
INFO:tensorflow:Create CheckpointSaverHook.
2017-08-17 12:56:53.160481: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 12:56:53.160536: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 12:56:53.160545: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 12:56:53.160550: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 12:56:53.160555: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Killed
Run Code Online (Sandbox Code Playgroud)
amo*_*ej1 13
当我运行你的代码时,我会得到相同的行为,在输入之后dmesg你会看到类似的痕迹,这证实了gdelab暗示的内容:
[38607.234089] python3 invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0, order=0, oom_score_adj=0
[38607.234090] python3 cpuset=/ mems_allowed=0
[38607.234094] CPU: 3 PID: 1420 Comm: python3 Tainted: G O 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[38607.234094] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.2.4 03/29/2017
[38607.234096] 0000000000000000 ffffffffa9f28414 ffffa50090317cf8 ffff940effa5f040
[38607.234097] ffffffffa9dfe050 0000000000000000 0000000000000000 0101ffffa9d82dd0
[38607.234098] e09c7db7f06d0ac2 00000000ffffffff 0000000000000000 0000000000000000
[38607.234100] Call Trace:
[38607.234104] [<ffffffffa9f28414>] ? dump_stack+0x5c/0x78
[38607.234106] [<ffffffffa9dfe050>] ? dump_header+0x78/0x1fd
[38607.234108] [<ffffffffa9d8047a>] ? oom_kill_process+0x21a/0x3e0
[38607.234109] [<ffffffffa9d800fd>] ? oom_badness+0xed/0x170
[38607.234110] [<ffffffffa9d80911>] ? out_of_memory+0x111/0x470
[38607.234111] [<ffffffffa9d85b4f>] ? __alloc_pages_slowpath+0xb7f/0xbc0
[38607.234112] [<ffffffffa9d85d8e>] ? __alloc_pages_nodemask+0x1fe/0x260
[38607.234113] [<ffffffffa9dd7c3e>] ? alloc_pages_vma+0xae/0x260
[38607.234115] [<ffffffffa9db39ba>] ? handle_mm_fault+0x111a/0x1350
[38607.234117] [<ffffffffa9c5fd84>] ? __do_page_fault+0x2a4/0x510
[38607.234118] [<ffffffffaa207658>] ? page_fault+0x28/0x30
...
[38607.234158] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
...
[38607.234332] [ 1396] 1000 1396 4810969 3464995 6959 21 0 0 python3
[38607.234332] Out of memory: Kill process 1396 (python3) score 568 or sacrifice child
[38607.234357] Killed process 1396 (python3) total-vm:19243876kB, anon-rss:13859980kB, file-rss:0kB, shmem-rss:0kB
[38607.720757] oom_reaper: reaped process 1396 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Run Code Online (Sandbox Code Playgroud)
这基本上意味着python开始消耗太多内存而内核决定杀死进程.如果在代码中添加一些打印件,您将看到mnist_classifier.train()这是一个活动的功能.然而,一些愚蠢的测试(删除日志记录和降低步骤,似乎没有帮助).
正如其他评论者所说,您的操作系统会因为内存不足而终止您的进程.您正在尝试构建一个庞大的网络.让我们看看你最后的密集层.它有65536个输入和65536个单位.每个单位对每个输入都有权重,因此使得65536*65536 = 4294967296个权重.权重基于您的输入dtype,我认为你的是float64,所以让我们乘以64,你得到32GB的权重(65536*65536*64/1024/1024/1024/8 = 32).并且所有这些权重都是单个张量,必须作为一个整体进行操作,因此它必须完全适合RAM.你的系统有32GB的RAM吗?
您的程序被操作系统杀死了,Tensorflow不知道为什么,这就是为什么它不输出任何东西。这可能是由于内存不足错误。
检查您是否syslog包含这样的行:
<date> <computer> kernel: [...] Out of memory: Kill process <id> (python) score <...> or sacrifice child
Run Code Online (Sandbox Code Playgroud)
如果是这样,则需要增加python允许的内存,和/或减少程序使用的内存。
| 归档时间: |
|
| 查看次数: |
6488 次 |
| 最近记录: |