Find common lines to multiple files

Question

Find common lines to multiple files

I have nearly 200 files and I want to find lines that are common to all 200 files,the lines are like this:

HISEQ1:105:C0A57ACXX:2:1101:10000:105587/1
HISEQ1:105:C0A57ACXX:2:1101:10000:105587/2
HISEQ1:105:C0A57ACXX:2:1101:10000:121322/1
HISEQ1:105:C0A57ACXX:2:1101:10000:121322/2
HISEQ1:105:C0A57ACXX:2:1101:10000:12798/1
HISEQ1:105:C0A57ACXX:2:1101:10000:12798/2

Run Code Online (Sandbox Code Playgroud)

is there a way to do it in a batch way?

Answer 1

hek*_*mgl 6

我不认为有一个 unix 命令可以用来完成任务。但是您可以围绕comm和grep命令创建一个小 shell 脚本，如下例所示：

#!/bin/bash    

# Prepare 200 (small) test files
rm data-*.txt
for i in {1..200} ; do
    echo "${i}" >> "data-${i}.txt"
    # common line
    echo "foo common line" >> "data-${i}.txt"
done

# Get the common lines between file1 and file2.
# file1 and file2 can be random files out of the set,
# ideally they are the smallest ones
comm -12 data-1.txt data-2.txt > common_lines

# Now grep through the remaining files for those lines
for file in data-{3..100}.txt ; do
    # For each remaining file reduce the common_lines to those
    # which are found in that file
    grep -Fxf common_lines "${file}" > tmp_common_lines \
        && mv tmp_common_lines > common_lines
done

# Print the common lines
cat common_lines

Run Code Online (Sandbox Code Playgroud)

相同的方法可用于更大的文件。这将需要更长的时间，但内存消耗保持线性。

归档时间：	6 年，3 月前
查看次数：	610 次
最近记录：	6 年，3 月前