Men*_*del 4 command-line bash perl sed text-processing
我有多个包含如下内容的文件:
File 1
NC_12548 og789 |nd784 -2 -54 -6
NC_12548 og789 |nd784 -2 -54 -6
NC_12548 og789 |nd784 -2 -54 -6
File2
NC_54456 og789 |nd784 -5 -56 -6
NC_98123 og859 |nd784 -5 -84 -5
NC_689.1 og456 |nd784 -5 -54 +8
File3
NC_54456 og789 |nd784 -5 -56 -6
NC_98123 og859 |nd784 -5 -84 -5
NC_689.1 og456 |nd784 -5 -54 +8
Run Code Online (Sandbox Code Playgroud)
我想保留仅有的前两列 (NC_12345 og855) 并丢弃其余的列。我怎样才能做到这一点?
有了awk你可以使用|作为列分隔符和打印的第一列:
awk -F '|' '{print $1}' file1.txt file2.txt file3.txt
Run Code Online (Sandbox Code Playgroud)
输出将被连接。如果需要将输出保存在单独的文件中,请考虑在 shell 中使用 for 循环awk
# assuming they're all in the same directory, hence `*`
for fname in ./file*.txt ; do
# add extension to current file in "$fname" variable indicate new file
# > does the actual redirection
awk -F '|' '{print $1}' "$fname" > "$fname".new
done
Run Code Online (Sandbox Code Playgroud)
.new备份可能需要有新的输出。否则,我们可以使用sed -i来执行文件内替换。无需-i先运行即可进行测试
# use file*.txt if they're all in the current directory
sed -i 's/|.*$//' file1.txt file2.txt file3.txt
sed -i 's/\(^.*\)|.*/\1/g' file1.txt file2.txt file3.txt
Run Code Online (Sandbox Code Playgroud)
另一种选择是通过 Python:
#!/usr/bin/env python3
import sys
for fname in sys.argv:
with open(fname) as fd_read, open(fname+'.new','w') as fd_write:
for line in fd_read:
fd_write.write(line.split('|')[0] + '\n')
Run Code Online (Sandbox Code Playgroud)
此脚本旨在用作./script.py file1.txt file2.txt file3.txt并将输出写入具有.new扩展名的新文件