选择每个文件的某一列，粘贴到一个新文件

Question

选择每个文件的某一列，粘贴到一个新文件

Jun*_*eng 7 text-processing columns paste

我有 20 个具有相同行数的制表符分隔文件。我想选择每个文件的第 4 列，粘贴到一个新文件中。最后，新文件将有 20 列，每列来自 20 个不同的文件。

如何使用 Unix/Linux 命令执行此操作？

输入，20 个相同格式。我希望文件 1 的第 4 列在此处表示为 A1：

chr1    1734966 1735009 A1       0       0       0       0       0       1       0
chr1    2074087 2083457 A1       0       1       0       0       0       0       0
chr1    2788495 2788535 A1       0       0       0       0       0       0       0
chr1    2821745 2822495 A1       0       0       0       0       0       1       0
chr1    2821939 2822679 A1       1       0       0       0       0       0       0
...

Run Code Online (Sandbox Code Playgroud)

输出文件，有 20 列，每列来自 20 个文件的第 4 列之一：

A1       A2       A3       ...       A20
A1       A2       A3       ...       A20
A1       A2       A3       ...       A20
A1       A2       A3       ...       A20
A1       A2       A3       ...       A20
...

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ant*_*hon 5

在pastebash 下你可以这样做：

paste <(cut -f 4 1.txt) <(cut -f 4 2.txt) .... <(cut -f 4 20.txt)

Run Code Online (Sandbox Code Playgroud)

使用 python 脚本和任意数量的文件 ( python scriptname.py column_nr file1 file2 ... filen)：

#! /usr/bin/env python

# invoke with column nr to extract as first parameter followed by
# filenames. The files should all have the same number of rows

import sys

col = int(sys.argv[1])
res = {}

for file_name in sys.argv[2:]:
    for line_nr, line in enumerate(open(file_name)):
        res.setdefault(line_nr, []).append(line.strip().split('\t')[col-1])

for line_nr in sorted(res):
    print '\t'.join(res[line_nr])

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，1 月前
查看次数：	17502 次
最近记录：	5 年，1 月前