bry*_*bee 4 python nlp nltk python-itertools
鉴于我有一个像这样的字符串:
'velvet evening purse bags'
Run Code Online (Sandbox Code Playgroud)
我怎样才能得到所有的单词对?换句话说,这是所有2字组合:
'velvet evening'
'velvet purse'
'velvet bags'
'evening purse'
'evening bags'
'purse bags'
Run Code Online (Sandbox Code Playgroud)
我知道python的nltk软件包可以提供二元组,但是我正在寻找功能之外的东西。还是我必须用Python编写自己的自定义函数?
您可以itertools.combinations为此使用:
s = 'velvet evening purse bags'
from nltk import word_tokenize
words = word_tokenize(s)
from itertools import combinations
pairs = [' '.join(comb) for comb in combinations(words, 2)]
print(pairs)
Run Code Online (Sandbox Code Playgroud)
输出:
['velvet evening', 'velvet purse', 'velvet bags', 'evening purse', 'evening bags', 'purse bags']
Run Code Online (Sandbox Code Playgroud)