Tua*_*inh 4 python grouping fuzzy-search string-matching
所以我有一个字符串列表如下:
list = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]
Run Code Online (Sandbox Code Playgroud)
如何在没有给定关键字的情况下遍历列表并对部分匹配的字符串进行分组。结果应如下所示:
list 1 = [["I love cat","I love dog","I love fish"],["I hate banana","I hate apple","I hate orange"]]
Run Code Online (Sandbox Code Playgroud)
非常感谢。
序列匹配器将为您完成任务。调整分数比率以获得更好的结果。
尝试这个:
from difflib import SequenceMatcher
sentence_list = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]
result=[]
for sentence in sentence_list:
if(len(result)==0):
result.append([sentence])
else:
for i in range(0,len(result)):
score=SequenceMatcher(None,sentence,result[i][0]).ratio()
if(score<0.5):
if(i==len(result)-1):
result.append([sentence])
else:
if(score != 1):
result[i].append(sentence)
Run Code Online (Sandbox Code Playgroud)
输出:
[['I love cat', 'I love dog', 'I love fish'], ['I hate banana', 'I hate apple', 'I hate orange']]
Run Code Online (Sandbox Code Playgroud)
list避免使用诸如命名变量之类的词语。也不list 1是有效的 python 变量。
尝试这个:
import sys
from itertools import groupby
#Assuming you group by the first two words in each string, e.g. 'I love', 'I hate'.
L = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]
L = sorted(L)
result = []
for key,group in groupby(L, lambda x: x.split(' ')[0] + ' ' + x.split(' ')[1]):
result.append(list(group))
print(result)
Run Code Online (Sandbox Code Playgroud)