如果我有这个字符串列表:
['fsuy3,fsddj4,fsdg3,hfdh6,gfdgd6,gfdf5',
'fsuy3,fsuy3,fdfs4,sdgsdj4,fhfh4,sds22,hhgj6,xfsd4a,asr3']
Run Code Online (Sandbox Code Playgroud)
(大名单)
如何删除少于1%和超过60%字符串的所有单词?
你可以使用collections.Counter
:
counts = Counter(mylist)
Run Code Online (Sandbox Code Playgroud)
然后:
newlist = [s for s in mylist if 0.01 < counts[s]/len(mylist) < 0.60]
Run Code Online (Sandbox Code Playgroud)
(在Python 2.x中使用float(counts[s])/len(mylist)
)
如果你在谈论逗号分隔的单词,那么你可以使用类似的方法:
words = [l.split(',') for l in mylist]
counts = Counter(word for l in words for word in l)
newlist = [[s for s in l if 0.01 < counts[s]/len(mylist) < 0.60] for l in words]
Run Code Online (Sandbox Code Playgroud)