我有两个csv文件old.csv和new.csv。我只需要new.csv文件中的新记录或更新记录。如果old.csv中存在记录,则从new.csv中删除记录。
old.csv
"R","abc","london","1234567"
"S","def","london","1234567"
"T","kevin","boston","9876"
"U","krish","canada","1234567"
Run Code Online (Sandbox Code Playgroud)
new.csv
"R","abc","london","5678"
"S","def","london","1234567"
"T","kevin","boston","9876"
"V","Bell","tokyo","2222"
Run Code Online (Sandbox Code Playgroud)
在new.csv中输出
"R","abc","london","5678"
"V","Bell","tokyo","2222"
Run Code Online (Sandbox Code Playgroud)
注意:如果new.csv中的所有记录都相同,则new.csv应该为空
例如使用grep:
$ grep -v -f old.csv new.csv # > the_new_new.csv
"R","abc","london","5678"
"V","Bell","tokyo","2222"
Run Code Online (Sandbox Code Playgroud)
和:
$ grep -v -f old.csv old.csv
$ # see, no differencies in 2 identical files
Run Code Online (Sandbox Code Playgroud)
man grep:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v
is specified by POSIX.)
Run Code Online (Sandbox Code Playgroud)
再说一次,您可以使用awk:
$ awk 'NR==FNR{a[$0];next} !($0 in a)' old.csv new.csv
"R","abc","london","5678"
"V","Bell","tokyo","2222"
Run Code Online (Sandbox Code Playgroud)
解释:
awk '
NR==FNR{ # the records in the first file are hashed to memory
a[$0]
next
}
!($0 in a) # the records which are not found in the hash are printed
' old.csv new.csv # > the_new_new.csv
Run Code Online (Sandbox Code Playgroud)
当文件排序时:
comm -13 old.csv new.csv
Run Code Online (Sandbox Code Playgroud)
当它们没有排序,并且允许排序时:
comm -13 <(sort old.csv) <(sort new.csv)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2190 次 |
| 最近记录: |