如何在文本中找到搭配,python

Gus*_*sto 4 python sorting find

你如何在文本中找到搭配?搭配是一系列非常频繁出现的单词.python有内置的func bigrams,返回单词对.

>>> bigrams(['more', 'is', 'said', 'than', 'done'])
[('more', 'is'), ('is', 'said'), ('said', 'than'), ('than', 'done')]
>>>
Run Code Online (Sandbox Code Playgroud)

剩下的就是根据单个词的频率找到更频繁发生的双字母.任何想法如何把它放在代码中?

Tim*_*ara 8

试试NLTK.您最感兴趣的是nltk.collocations.BigramCollocationFinder,但这里有一个快速演示,向您展示如何开始:

>>> import nltk
>>> def tokenize(sentences):
...     for sent in nltk.sent_tokenize(sentences.lower()):
...         for word in nltk.word_tokenize(sent):
...             yield word
... 

>>> nltk.Text(tkn for tkn in tokenize('mary had a little lamb.'))
<Text: mary had a little lamb ....>
>>> text = nltk.Text(tkn for tkn in tokenize('mary had a little lamb.'))
Run Code Online (Sandbox Code Playgroud)

在这个小部分中没有,但是这里有:

>>> text.collocations(num=20)
Building collocations list
Run Code Online (Sandbox Code Playgroud)