Mar*_*ern 4 python shell compare list
免责声明:我是编程和脚本编程的新手,所以请原谅缺乏技术术语
所以我有两个包含列出名称的文本文件数据集:
First File | Second File
bob | bob
mark | mark
larry | bruce
tom | tom
Run Code Online (Sandbox Code Playgroud)
我想运行一个脚本(pref python),它输出一个文本文件中的交叉线和另一个文本文件中的不同行,例如:
matches.txt:
bob
mark
tom
Run Code Online (Sandbox Code Playgroud)
differences.txt:
bruce
Run Code Online (Sandbox Code Playgroud)
我如何用Python实现这一目标?或者使用Unix命令行,如果它足够简单?
dst*_*erg 16
排序| uniq很好,但是comm可能会更好."man comm"了解更多信息.
从手册页:
EXAMPLES
comm -12 file1 file2
Print only lines present in both file1 and file2.
comm -3 file1 file2
Print lines in file1 not in file2, and vice versa.
Run Code Online (Sandbox Code Playgroud)
您也可以使用Python集类型,但comm更容易.
Unix shell解决方案 - :
# duplicate lines
sort text1.txt text2.txt | uniq -d
# unique lines
sort text1.txt text2.txt | uniq -u
Run Code Online (Sandbox Code Playgroud)
words1 = set(open("some1.txt").read().split())
words2 = set(open("some2.txt").read().split())
duplicates = words1.intersection(words2)
uniques = words1.difference(words2).union(words2.difference(words1))
print "Duplicates(%d):%s"%(len(duplicates),duplicates)
print "\nUniques(%d):%s"%(len(uniques),uniques)
Run Code Online (Sandbox Code Playgroud)
至少是这样的