如何组合BASH中两个CSV文件的数据?

Vil*_*age 4 ruby csv bash perl python-2.7

我有两个@用于分隔每列的CSV文件.第一个文件(file1.csv)有两列:

cat @ eats fish
spider @ eats insects
Run Code Online (Sandbox Code Playgroud)

第二个文件(file2.csv)有四列:

info @ cat @ info @ info
info @ spider @ info @ info
info @ rabbit @ info @ info
Run Code Online (Sandbox Code Playgroud)

我需要将第一个文件的第二列中的信息添加到第二个文件中的新列,如果第一个文件的第一列和第二个文件的第二列的详细信息匹配,例如结果以上将使这个:

info @ cat @ info @ info @ eats fish
info @ spider @ info @ info @ eats insects
info @ rabbit @ info @ info @
Run Code Online (Sandbox Code Playgroud)

如上所示,由于第一个文件不包含有关兔子的信息,因此将新的空列添加到第二个文件的最后一行.

以下是我知道如何做到目前为止:

while read line 可用于循环遍历第二个文件中的行,例如:

while read line
do
    (commands)
done < file2.csv
Run Code Online (Sandbox Code Playgroud)

从特定列中的数据可以与被访问awk -F "@*" '{print $n}',其中n是列号.

while read line
do
    columntwo=$(echo $line | awk -F "@*" '{print $2})
    while read line
    do
        columnone=$(echo $line | awk -F "@*" '{print $1})
        if [ “$columnone” == “$columntwo” ]
        then
            (commands)
        fi
    done < file1.csv
done < file2.csv
Run Code Online (Sandbox Code Playgroud)

我的方法似乎效率低下,我不知道如何使用将第二列中的数据添加file1.csv1到新列中file2.csv.

  • 第1列file1.csv1和第2 列中的项目file2.csv对这些文件是唯一的.这些文件中没有重复的条目.
  • 生成的文件在每一行中应该只有5列,即使某些列为空.
  • 该文件包含UTF-8中各种语言的大量字符.
  • 周围有空白区域@,但如果这会导致脚本出现问题,我可以删除它.

如何将第一个文件中的数据添加到第二个文件中的数据?

Kev*_*vin 5

jowdder的答案几乎就在那里,但由于我在评论中提到的问题不完整:字段中会有不需要的空格,文件没有排序,这是他们需要的.

join -t@ -11 -22 -o2.1,0,2.3,2.4,1.2 <(sed 's/ *@ */@/g' file1.csv | sort -t@) <(sed 's/ *@ */@/g' file2.csv | sort -t@ -k2) | sed 's/@/ @ /g' > output-file
Run Code Online (Sandbox Code Playgroud)

这也可以写成bash脚本,我将解释其中的每一步:

#!/bin/bash -e

# Remove whitespace around the `@`s, then sort using `@` to separate fields (-t@). 
# -k2 tells sort to use the second field.
sed 's/ *@ */@/g' file1.csv | sort -t@ >temp-left
sed 's/ *@ */@/g' file2.csv | sort -t@ -k2 >temp-right

# Join the files. -t@ means break fields at @, 
# -11 says use the first field in the first file,  -22 is the second field in the second file.
# -o... controls the output format, 2.1=second file, first field; 0 is the join field.
join -t@ -11 -22 -o2.1,0,2.3,2.4,1.2 temp-left temp-right > temp-joined

# Add whitespace back in around the @s so it looks better.
sed 's/@/ @ /g' temp-joined >output-file

# Clean up temporary files
rm temp-{left,right,joined}
Run Code Online (Sandbox Code Playgroud)


Kev*_*vin 5

还有一个很好的,干净的awk解决方案:

awk -F" *@ *" 'NR==FNR{lines[$2]=$0} NR!=FNR{if(lines[$1])lines[$1]=lines[$1] " @ " $2} END{for(line in lines)print lines[line]}' file2.csv file1.csv
Run Code Online (Sandbox Code Playgroud)

一个好的单线客。不是很短,但不是我见过的最长的。请注意,file2和file1已切换。再次,作为带有解释的脚本:

#!/usr/bin/awk -f

# Split fields on @ and the whitespace on either side.
BEGIN { FS = " *@ *" }

# First file
NR == FNR {
    #Store the line
    lines[$2] = $0
}

# Second file
NR != FNR {
    # If the appropriate animal was in the first file, append its eating habits.
    # If not, it's discarded; if you want something else, let me know.
    if(lines[$1]) lines[$1] = lines[$1] " @ " $2
}

# After both files have been processed
END {
    # Loop over all lines in the first file and print them, possibly updated with eating habits.
    # No guarantees on order.
    for(line in lines) print lines[line]
}
Run Code Online (Sandbox Code Playgroud)

呼叫为awk -f join.awk file2.csv file1.csv,或将其设为和./join.awk file2.csv file1.csv