我想查找和比较有效的列表中的字符串元素,然后删除其是其他字符串元素的部件列表(具有相同的起点)
list1 = [ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' , ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]
Run Code Online (Sandbox Code Playgroud)
我打算得到一个如下所示的列表:
list2 = [ 'green apples are worse' , ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]
Run Code Online (Sandbox Code Playgroud)
换句话说,我想从那些以相同的第一个字符开头的元素中保留最长的字符串元素。
正如约翰科尔曼在评论中建议的那样,您可以先对句子进行排序,然后比较连续的句子。如果一个句子是另一个句子的前缀,那么它会出现在排序列表中该句子之前,因此我们只需比较连续的句子即可。要保留原始顺序,您可以使用 aset来快速查找过滤后的元素。
list1 = [\'a boy ran\', \'green apples are worse\', \n \'a boy ran towards the mill\', \' this is another sentence \',\n \'a boy ran towards the mill and fell\'] \n\nsrtd = sorted(list1)\nfiltered = set(list1)\nfor a, b in zip(srtd, srtd[1:]):\n if b.startswith(a):\n filtered.remove(a)\n\nlist2 = [x for x in list1 if x in filtered] \nRun Code Online (Sandbox Code Playgroud)\n\n之后,list2内容如下:
[\'green apples are worse\',\n \' this is another sentence \',\n \'a boy ran towards the mill and fell\']\nRun Code Online (Sandbox Code Playgroud)\n\n使用 O(nlogn) ,这比比较 O(n\xc2\xb2) 中的所有句子对要快得多,但如果列表不太长,Vicrobot 的更简单的解决方案也可以工作。
\n