按标题添加CSV中的缺失列

Mar*_*Mar 0 bash awk sed

我有两个文件:

adjective,adverb,participle,verb 
0,2,3,5, 
1,2,5,6
Run Code Online (Sandbox Code Playgroud)

adjective,adjunct,adverbial,participle,verb
0,2,3,5,4
1,2,5,6,5
1,2,5,6,5
Run Code Online (Sandbox Code Playgroud)

我想获得这样的输出:

adjective,adjunct,adverb,adverbial,participle,verb
    0,2,0,3,5,4
    1,2,0,5,6,5
    1,2,0,5,6,5
Run Code Online (Sandbox Code Playgroud)

这样就可以根据标题合并列并按字母顺序排序.我不关心保留添加列中第二个文件的数字,它们可以用0填充.重要的是添加缺少的列并按字母顺序对它们进行排序.加入没有帮助,因为它只加入一列.有任何想法吗?

gle*_*man 5

我不明白为什么join不是一个选择:

join -t, -a 1 -o 0,2.2,1.2,2.3,1.3,2.5 file1 file2 
Run Code Online (Sandbox Code Playgroud)
adjective,adjunct,adverb,adverbial,participle,verb
0,2,2,3,3,4
1,2,2,5,5,5
1,2,2,5,5,5
Run Code Online (Sandbox Code Playgroud)

-a为每个文件指定了join字段,并-o指定了输出格式(从哪个文件中的哪个字段)


我可以稍后回来.在此期间,您可以像这样提取合并的列标题:

paste -d , file1 file2 | sed 1q | tr , '\n' | sed 's/  *$//' | sort -u | paste -d, -s 
Run Code Online (Sandbox Code Playgroud)
adjective,adjunct,adverb,adverbial,participle,verb
Run Code Online (Sandbox Code Playgroud)

好的,只有GNU awk答案:

  • 这将读取file1的标头和file2的标头以获取一组唯一的标头.
  • 使用PROCINFO["sorted_in"]gawk 的特征通过词法排序的索引顺序遍历关联数组
gawk -F, '
    NR == 1 {
        n = split($0, f1cols, /,/)
        for (i=1; i<=n; i++) 
            allcols[f1cols[i]] = 1 
    }
    NR == FNR {next} # because you do not care about the values
    FNR == 1 {
        n = split($0, f2cols, /,/)
        for (i=1; i<=n; i++) {
            allcols[f2cols[i]] = 1
            f2colidx[f2cols[i]] = i
        }
        PROCINFO["sorted_in"] = "@ind_str_asc"
        sep = ""
        for (head in allcols) {
            printf "%s%s", sep, head
            sep = FS
        }
        print ""
        next
    }
    {
        sep = ""
        for (col in allcols) {
            val = (col in f2colidx) ? $(f2colidx[col]) : 0
            printf "%s%s", sep, val
            sep = FS
        }
        print ""
    }
' file1 file2
Run Code Online (Sandbox Code Playgroud)
adjective,adjunct,adverb,adverbial,participle,verb
0,2,0,3,5,4
1,2,0,5,6,5
1,2,0,5,6,5
Run Code Online (Sandbox Code Playgroud)