如果我有这个字符串列表:
['fsuy3,fsddj4,fsdg3,hfdh6,gfdgd6,gfdf5',
'fsuy3,fsuy3,fdfs4,sdgsdj4,fhfh4,sds22,hhgj6,xfsd4a,asr3']
Run Code Online (Sandbox Code Playgroud)
(大名单)
如何删除少于1%和超过60%字符串的所有单词?
你可以使用collections.Counter:
counts = Counter(mylist)
Run Code Online (Sandbox Code Playgroud)
然后:
newlist = [s for s in mylist if 0.01 < counts[s]/len(mylist) < 0.60]
Run Code Online (Sandbox Code Playgroud)
(在Python 2.x中使用float(counts[s])/len(mylist))
如果你在谈论逗号分隔的单词,那么你可以使用类似的方法:
words = [l.split(',') for l in mylist]
counts = Counter(word for l in words for word in l)
newlist = [[s for s in l if 0.01 < counts[s]/len(mylist) < 0.60] for l in words]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
734 次 |
| 最近记录: |