Win*_*nix 4 command-line text-processing
我试图捕捉在有限的时间段内开始的进程。
我创建了一个脚本 ( ps-suspects.sh
),其中:
ps-suspects.sh
从终端运行。ps-suspects.sh
我有一段需要微调的代码:
$ sort -k15 ~/pid.log | uniq -f14 -c
Run Code Online (Sandbox Code Playgroud)
这是它产生的:
$ head ~/pid.tmp
1 /mnt/e/bin/ps-suspects.sh Possible suspects causing problems
63 1 S root 127 2 0 60 -20 - 0 - Sep08 ? 00:00:00 [acpi_thermal_pm]
63 1 S root 75 2 0 60 -20 - 0 - Sep08 ? 00:00:00 [ata_sff]
63 1 S root 447 2 0 60 -20 - 0 - Sep08 ? 00:00:00 [ath10k_aux_wq]
63 1 S root 446 2 0 60 -20 - 0 - Sep08 ? 00:00:00 [ath10k_wq]
63 1 S avahi 922 910 0 80 0 - 11195 - Sep08 ? 00:00:00 avahi-daemon: chroot helper
63 4 S avahi 910 1 0 80 0 - 11228 - Sep08 ? 00:00:00 avahi-daemon: running [alien.local]
126 0 S rick 2902 2867 0 80 0 - 7409 wait_w Sep08 pts/18 00:00:00 bash
63 0 S rick 25894 5775 0 80 0 - 4908 wait 10:43 pts/2 00:00:00 /bin/bash /mnt/e/bin/ps-suspects.sh
63 0 S root 980 976 0 80 0 - 4921 - Sep08 ? 00:00:01 /bin/bash /usr/local/bin/display-auto-brightness
Run Code Online (Sandbox Code Playgroud)
我想消除所有出现63
或多次出现的线条。
$ ps-suspects.sh
20 times / second ps -elf is captured to /home/rick/pid.log
Type Ctrl+C when done capturing
~/pid.log is sorted and uniq counted on column 15
which is full path and program name.
Then all matches with same unique count (the headings)
are stripped and only new processes started are printed.
This function can help you trace down what processes are
causing you grief for lid close events, hot plugging, etc.
^C
wc of ~/pid.log : 17288 343162 2717102 /home/rick/pid.log
HighCnt: 63
1 /mnt/e/bin/ps-suspects.sh Possible suspects causing problems
26 0 R rick 25976 2051 0 80 0 - 120676 - 10:43 ? 00:00:00 gnome-calculator
62 0 S root 22561 980 0 80 0 - 3589 - 10:42 ? 00:00:00 sleep 60
Run Code Online (Sandbox Code Playgroud)
在此示例中,63
将出现在第 1 列中 90%-99% 的行上,需要删除这些行。126
也可以删除所有出现的。因此,任何最常发生和更大的事物都可以被移除。
有人能想出遗漏的awk
和/或uniq
和/或grep
完成任务吗?
Python 来救援:
python3 -c 'import sys,collections;l=[(int(L.split(None,1)[0]),L)for L in sys.stdin.readlines()];m=collections.Counter(x[0]for x in l).most_common(1)[0][0];print(*[x[1]for x in l if x[0]<m],sep="",end="")'
Run Code Online (Sandbox Code Playgroud)
用作脚本文件的替代未压缩版本:
#!/usr/bin/env python3
import sys
import collections
# read lines from stdin (with trailing \n) and extract the number in their first column
items = [(int(line.split(None, 1)[0]), line) for line in sys.stdin]
# find the most common number from the first column
most_common = collections.Counter(item[0] for item in items).most_common()[0][0]
# print input lines in order, but only those with their number lower than the most common
print(*[item[1] for item in items if item[0] < most_common], sep="", end="")
Run Code Online (Sandbox Code Playgroud)
该脚本对其输入做出的唯一假设(预计将通过管道输入标准输入)是每一行在其第一个以空格分隔的列中都有一个有效的整数。这些行不需要以任何形式排序。
注意:如果第一列中有多个不同的最常见数字具有相同的计数,那么选择这两个中的哪一个是任意的,但对于相同的输入应该是恒定的。如果这是不希望的,你必须用这样的东西替换找到最常见值的行,以找到最高的最常见值:
most_common = sorted(collections.Counter(item[0] for item in items).most_common(),
key=lambda x:x[::-1])[-1][0]
Run Code Online (Sandbox Code Playgroud)
示例输入:
1 foo
3 bar
2 baz
3 apple
3 banana
2 cherry
4 beep
Run Code Online (Sandbox Code Playgroud)
示例输出:
1 foo
2 baz
2 cherry
Run Code Online (Sandbox Code Playgroud)