如何在UNIX中比较两个csv文件并创建增量(已修改/新记录)

use*_*120 3 unix awk

我有两个csv文件old.csv和new.csv。我只需要new.csv文件中的新记录或更新记录。如果old.csv中存在记录,则从new.csv中删除记录。

old.csv

"R","abc","london","1234567"
"S","def","london","1234567"
"T","kevin","boston","9876"
"U","krish","canada","1234567"
Run Code Online (Sandbox Code Playgroud)

new.csv

"R","abc","london","5678"
"S","def","london","1234567"
"T","kevin","boston","9876"
"V","Bell","tokyo","2222"
Run Code Online (Sandbox Code Playgroud)

在new.csv中输出

"R","abc","london","5678"     
"V","Bell","tokyo","2222"
Run Code Online (Sandbox Code Playgroud)

注意:如果new.csv中的所有记录都相同,则new.csv应该为空

Jam*_*own 5

例如使用grep

$ grep -v -f old.csv new.csv # > the_new_new.csv 
"R","abc","london","5678"
"V","Bell","tokyo","2222"
Run Code Online (Sandbox Code Playgroud)

和:

$ grep -v -f old.csv old.csv
$                            # see, no differencies in 2 identical files
Run Code Online (Sandbox Code Playgroud)

man grep

  -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

  -v, --invert-match
          Invert the sense of matching, to select non-matching lines.  (-v
          is specified by POSIX.)
Run Code Online (Sandbox Code Playgroud)

再说一次,您可以使用awk:

$ awk 'NR==FNR{a[$0];next} !($0 in a)' old.csv new.csv
"R","abc","london","5678"
"V","Bell","tokyo","2222"
Run Code Online (Sandbox Code Playgroud)

解释:

awk '
NR==FNR{            # the records in the first file are hashed to memory
    a[$0]
    next
} 
!($0 in a)          # the records which are not found in the hash are printed
' old.csv new.csv   # > the_new_new.csv 
Run Code Online (Sandbox Code Playgroud)


Wal*_*r A 5

当文件排序时:

comm -13 old.csv new.csv
Run Code Online (Sandbox Code Playgroud)

当它们没有排序,并且允许排序时:

comm -13 <(sort old.csv) <(sort new.csv)
Run Code Online (Sandbox Code Playgroud)