Gus*_*sto 4 python sorting find
你如何在文本中找到搭配?搭配是一系列非常频繁出现的单词.python有内置的func bigrams,返回单词对.
>>> bigrams(['more', 'is', 'said', 'than', 'done'])
[('more', 'is'), ('is', 'said'), ('said', 'than'), ('than', 'done')]
>>>
Run Code Online (Sandbox Code Playgroud)
剩下的就是根据单个词的频率找到更频繁发生的双字母.任何想法如何把它放在代码中?
试试NLTK.您最感兴趣的是nltk.collocations.BigramCollocationFinder
,但这里有一个快速演示,向您展示如何开始:
>>> import nltk
>>> def tokenize(sentences):
... for sent in nltk.sent_tokenize(sentences.lower()):
... for word in nltk.word_tokenize(sent):
... yield word
...
>>> nltk.Text(tkn for tkn in tokenize('mary had a little lamb.'))
<Text: mary had a little lamb ....>
>>> text = nltk.Text(tkn for tkn in tokenize('mary had a little lamb.'))
Run Code Online (Sandbox Code Playgroud)
在这个小部分中没有,但是这里有:
>>> text.collocations(num=20)
Building collocations list
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
8672 次 |
最近记录: |