我有一个11行的csv文件,如下所示:
Order Date,Username,Order Number,No Resi,Quantity,Title,Update Date,Status,Price Per Item,Status Tracking,Alamat
05 Jun 2018,Mildred@email.com,205583995140400,,2,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Syahrul Address
05 Jun 2018,Mildred@email.com,205583995140400,,1,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Syahrul Address
05 Jun 2018,Martha@email.com,205486016644400,,2,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Faishal Address
05 Jun 2018,Martha@email.com,205486016644400,,2,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Faishal Address
05 Jun 2018,Misty@email.com,205588935534900,,2,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Rutwan Address
05 Jun 2018,Misty@email.com,205588935534900,,1,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Rutwan Address
Run Code Online (Sandbox Code Playgroud)
我想删除该文件中的重复项并对Quantity行中的值求和.我希望结果是这样的:
Order Date,Username,Order Number,No Resi,Quantity,Title,Update Date,Status,Price Per Item,Status Tracking,Alamat
05 Jun 2018,Mildred@email.com,205583995140400,,3,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Syahrul Address
05 Jun 2018,Martha@email.com,205486016644400,,4,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Faishal Address
05 Jun 2018,Misty@email.com,205588935534900,,3,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Rutwan Address
Run Code Online (Sandbox Code Playgroud)
我只想对Quantity行中的值求和,而其余部分保持不变.我已经尝试过这个问题的解决方案但是只有文件只有2行才有效,但我有11个,所以它不起作用.我怎么用awk做到这一点?
awk 救援!
$ awk 'BEGIN{FS=OFS=","}
NR==1{print; next}
{q=$5; $5="~"; a[$0]+=q}
END {for(k in a) {sub("~",a[k],k); print k}}' file
Order Date,Username,Order Number,No Resi,Quantity,Title,Update Date,Status,Price Per Item,Status Tracking,Alamat
05 Jun 2018,Misty@email.com,205588935534900,,3,Gold,05 Jun 2018 - 10:01,In Process,Rp3.000.000,Done,Rutwan Address
05 Jun 2018,Martha@email.com,205486016644400,,4,Gold,05 Jun 2018 - 10:01,In Process,Rp3.000.000,Done,Faishal Address
05 Jun 2018,Mildred@email.com,205583995140400,,3,Gold,05 Jun 2018 - 10:01,In Process,Rp3.000.000,Done,Syahrul Address
Run Code Online (Sandbox Code Playgroud)
请注意,记录的顺序不能保证,但也不要求它们最初排序.为了保留订单,有多种解决方案......
另外,我~用作占位符.如果您的数据包含此字符,则可以替换为未使用的字符.
UPDATE
保留订单(基于行的首次出现)
$ awk 'BEGIN{FS=OFS=","}
NR==1{print; next}
{q=$5;$5="~"; if(!($0 in a)) b[++c]=$0; a[$0]+=q}
END {for(k=1;k<=c;k++) {sub("~",a[b[k]],b[k]); print b[k]}}' file
Run Code Online (Sandbox Code Playgroud)
保持一个单独的结构来标记行的顺序并迭代该数据结构......
直接采用 Karafka 的解决方案,并在其中添加一些代码,以使行按照 OP 的请求按正确的顺序排列(其中它们出现在 Input_file 中)。
awk -F, '
FNR==1{
print;
next}
{
val=$5;
$5="~";
a[$0]+=val
}
!b[$0]++{
c[++count]=$0}
END{
for(i=1;i<=count;i++){
sub("~",a[c[i]],c[i]);
print c[i]}
}' OFS=, Input_file
Run Code Online (Sandbox Code Playgroud)
说明:现在也为上面的代码添加说明。
awk -F, ' ##Setting field separator as comma here.
FNR==1{ ##Checking condition if line number is 1 then do following.
print; ##Print the current line.
next} ##next will skip all further statements from here.
{
val=$5; ##Creating a variable named val whose value is 5th field of current line.
$5="~"; ##Setting value of 5th field as ~ here to keep all lines same(to create index for array a).
a[$0]+=val ##Creating an array named a whose index is current line and its value is variable val value.
}
!b[$0]++{ ##Checking if array b whose index is current line its value is NULL then do following.
c[++count]=$0} ##Creating an array named c whose index is variable count increasing value with 1 and value is current line.
END{ ##Starting END block of awk code here.
for(i=1;i<=count;i++){ ##Starting a for loop whose value starts from 1 to till value of count variable.
sub("~",a[c[i]],c[i]); ##Substituting ~ in value of array c(which is actually lines value) with value of SUMMED $5.
print c[i]} ##Printing newly value of array c where $5 is now replaced with its actual value.
}' OFS=, Input_file ##Setting OFS as comma here and mentioning Input_file name here too.
Run Code Online (Sandbox Code Playgroud)