我正在尝试编写一个脚本来查看一行的一部分,执行sort -u或者查找唯一的事件,然后显示输出,按行的ORIGINAL顺序排序.换句话说,只会显示该行的第一次出现.
我设法使用cut,但我的输出只显示数据的切割部分.我怎么能这样做才能得到整条线?
这是我到目前为止所得到的:
cut -d, -f6 infile.txt | cut -c4-11 | grep -n . | sort -t: -k2,2 -u | sort -t: -k1n,1 | cut -d: -f2-
Run Code Online (Sandbox Code Playgroud)
我知道数据没有额外的:或,在一个会破坏这个脚本的地方.但这只会输出唯一的数据.我怎样才能获得整条生产线?我宁愿远离perl,但是awk还可以(虽然我不太清楚).
如果输入文件是这样的(注意,ABCDEFGH不是真的,我只是把它放在那里来说明我的意思):
A....,....,...........,.....,....,...20130718......,.........,...........,......
B....,....,...........,.....,....,...20130714......,.........,...........,......
C....,....,...........,.....,....,...20130718......,.........,...........,......
D....,....,...........,.....,....,...20130719......,.........,...........,......
E....,....,...........,.....,....,...20130713......,.........,...........,......
F....,....,...........,.....,....,...20130714......,.........,...........,......
G....,....,...........,.....,....,...20130630......,.........,...........,......
H....,....,...........,.....,....,...20130718......,.........,...........,......
Run Code Online (Sandbox Code Playgroud)
我的课程输出:
20130718
20130714
20130719
20130713
20130630
Run Code Online (Sandbox Code Playgroud)
我想看看:
A....,....,...........,.....,....,...20130718......,.........,...........,......
B....,....,...........,.....,....,...20130714......,.........,...........,......
D....,....,...........,.....,....,...20130719......,.........,...........,......
E....,....,...........,.....,....,...20130713......,.........,...........,......
G....,....,...........,.....,....,...20130630......,.........,...........,......
Run Code Online (Sandbox Code Playgroud)
是的,这awk是你最好的选择.这是一个神秘的例子:
awk -F, '!seen[substr($6,4,8)]++' infile.txt
Run Code Online (Sandbox Code Playgroud)
说明:
options:
-F, set the field separator to ,
condition:
substr($6,4,8) up to 8 characters starting at the fourth character
of the sixth field
seen[...]++ seen is an associative array (dictionary). Increment the
value associated with ..., and return the old value
!seen[...]++ if there was no old value, perform the action
action:
There is no action, only a condition, so the default action is
performed if the test succeeds. The default action is to print
the line. So the line will be printed if the relevant characters of
the sixth field haven't yet been seen.
Run Code Online (Sandbox Code Playgroud)
测试:
$ awk -F, '!seen[substr($6,4,8)]++' <<EOF
> A....,....,...........,.....,....,...20130718......,.........,...........,......
> B....,....,...........,.....,....,...20130714......,.........,...........,......
> C....,....,...........,.....,....,...20130718......,.........,...........,......
> D....,....,...........,.....,....,...20130719......,.........,...........,......
> E....,....,...........,.....,....,...20130713......,.........,...........,......
> F....,....,...........,.....,....,...20130714......,.........,...........,......
> G....,....,...........,.....,....,...20130630......,.........,...........,......
> H....,....,...........,.....,....,...20130718......,.........,...........,......
> EOF
A....,....,...........,.....,....,...20130718......,.........,...........,......
B....,....,...........,.....,....,...20130714......,.........,...........,......
D....,....,...........,.....,....,...20130719......,.........,...........,......
E....,....,...........,.....,....,...20130713......,.........,...........,......
G....,....,...........,.....,....,...20130630......,.........,...........,......
$
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
516 次 |
| 最近记录: |