如何使用主键和多列信息将重复行合并到同一行

Question

如何使用主键和多列信息将重复行合并到同一行

这是我的数据：

NAME1,NAME1_001,NULL,LIC100_1,NULL,LIC300-3,LIC300-6
NAME1,NAME1_003,LIC000_1,NULL,NULL,NULL,NULL
NAME2,NAME2_001,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_001,NULL,LIC400_2,NULL,NULL,LIC500_1
NAME3,NAME3_005,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_006,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME4,NAME4_002,NULL,LIC100_1,NULL,LIC300-3,LIC300-6

Run Code Online (Sandbox Code Playgroud)

预期结果：

NAME1|NAME1_001|NULL|LIC100_1|NULL|LIC300-3|LIC300-6|NAME1_003|LIC000_1|NULL|NULL|NULL|NULL
NAME2|NAME2_001|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME3|NAME3_001|NULL|LIC400_2|NULL|NULL|LIC500_1|NAME3_005|LIC000_1|NULL|LIC400_2|NULL|NULL|NAME3_006|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME4|NAME4_002|NULL|LIC100_1|NULL|LIC300-3|LIC300-6

Run Code Online (Sandbox Code Playgroud)

我尝试了以下命令，但不知道如何添加详细信息（3 美元到 7 美元）

awk '
    BEGIN{FS=","; OFS="|"}; 
    { arr[$1] = arr[$1] == ""? $2 : arr[$1] "|" $2 }   
    END {for (i in arr) print i, arr[i] }' file.csv

Run Code Online (Sandbox Code Playgroud)

有什么建议吗？谢谢！！

Answer 1

Rav*_*h13 6

你能不能试试以下。使用 GNU 中显示的示例编写和测试awk。

awk '
BEGIN{
  FS=","
  OFS="|"
}
FNR==NR{
  first=$1
  $1=""
  sub(/^,/,"")
  arr[first]=(first in arr?arr[first] OFS:"")$0
  next
}
($1 in arr){
  print $1 arr[$1]
  delete arr[$1]
}
' Input_file  Input_file

Run Code Online (Sandbox Code Playgroud)

说明：为以上添加详细说明。

awk '                       ##Starting awk program from here.
BEGIN{                      ##Starting BEGIN section of this program from here.
  FS=","                    ##Setting FS as comma here.
  OFS="|"                   ##Setting OFS as | here.
}
FNR==NR{                    ##Checking FNR==NR which will be TRUE when first time Input_file is being read.
  first=$1                  ##Setting first as 1st field here.
  $1=""                     ##Nullifying first field here.
  sub(/^,/,"")              ##Substituting starting comma with NULL in current line.
  arr[first]=(first in arr?arr[first] OFS:"")$0  ##Creating arr with index of first and keep adding same index value to it.
  next                      ##next will skip all further statements from here.
}
($1 in arr){                ##Checking condition if 1st field is present in arr then do following.
  print $1 arr[$1]          ##Printing 1st field with arr value here.
  delete arr[$1]            ##Deleting arr item here.
}
' Input_file  Input_file    ##Mentioning Input_file names here.

Run Code Online (Sandbox Code Playgroud)

Answer 2

Jam*_*own 5

另一个awk：

$ awk '
BEGIN {               # set them field separators
    FS=","
    OFS="|"
}
{
    if($1 in a) {     # if $1 already has an entry in a hash
        t=$1          # store key temporarily
        $1=a[$1]      # set the a hash entry to $1
        a[t]=$0       # and hash the record
    } else {          # if $1 seen for the first time
        $1=$1         # rebuild record to change the separators
        a[$1]=$0      # and hash the record
    }
}
END {                 # afterwards
    for(i in a)       # iterate a 
        print a[i]    # and output
}' file

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，9 月前
查看次数：	79 次
最近记录：	4 年，9 月前