在Python中,我正在编写一个自然语言处理模块,无法解决如何编写函数来执行以下操作.输入:从输入的句子作为短字符串导出的词性(POS)列表.列表中的某些项目本身就是列表,因为程序的该部分不知道从两种或更多种可能中选择哪个词性.例如,一个特定["DET", "NOUN", ["VERB", "NOUN"], "CONJ", ["ADJ", "ADV", "NOUN"], "ADV"]
的六个字的句子导致,即第一个字肯定是一个DET,第二个字肯定是NOUN,第三个字可以是一个VERB或NOUN,第四个字肯定是一个CONJ,第五个字可以是ADJ,ADV或NOUN第6个字肯定是ADV.
所以INPUT = ["DET", "NOUN", ["VERB", "NOUN"], "CONJ", ["ADJ", "ADV", "NOUN"], "ADV"]
我需要函数将每个可能的组合作为列表列表返回.所以上面的返回值应该是:
[["DET", "NOUN", "NOUN", "CONJ", "NOUN", "ADV"],
["DET", "NOUN", "NOUN", "CONJ", "ADV", "ADV"],
["DET", "NOUN", "NOUN", "CONJ", "ADJ", "ADV"],
["DET", "NOUN", "VERB", "CONJ", "NOUN", "ADV"],
["DET", "NOUN", "VERB", "CONJ", "ADV", "ADV"],
["DET", "NOUN", "VERB", "CONJ", "ADJ", "ADV"]]
Run Code Online (Sandbox Code Playgroud)
句子可以是从1到n个单词长.每个单词可能会从一个到两个部分的语音回来.
您应该查看itertools模块和相关的配方.看起来您想要考虑所有可能的POS分配的笛卡尔积.这可以很容易地完成,尽管将INPUT的所有元素都列为列表更方便,即使它们只是一个列表.无论如何:
>>> import itertools
>>>
>>> INPUT = ["DET", "NOUN", ["VERB", "NOUN"], "CONJ", ["ADJ", "ADV", "NOUN"], "ADV"]
>>>
>>> I = [[kind] if type(kind) != list else kind for kind in INPUT]
>>> I
[['DET'], ['NOUN'], ['VERB', 'NOUN'], ['CONJ'], ['ADJ', 'ADV', 'NOUN'], ['ADV']]
Run Code Online (Sandbox Code Playgroud)
所以这些是我们想要选择的可能性.这itertools.product是为了什么:
>>> possible_assignments = list(itertools.product(*I))
>>> possible_assignments
[('DET', 'NOUN', 'VERB', 'CONJ', 'ADJ', 'ADV'), ('DET', 'NOUN', 'VERB', 'CONJ', 'ADV', 'ADV'), ('DET', 'NOUN', 'VERB', 'CONJ', 'NOUN', 'ADV'), ('DET', 'NOUN', 'NOUN', 'CONJ', 'ADJ', 'ADV'), ('DET', 'NOUN', 'NOUN', 'CONJ', 'ADV', 'ADV'), ('DET', 'NOUN', 'NOUN', 'CONJ', 'NOUN', 'ADV')]
Run Code Online (Sandbox Code Playgroud)
如果我理解你就是你想要的.好吧,他们是元组,而不是列表,但这不重要.
| 归档时间: |
|
| 查看次数: |
450 次 |
| 最近记录: |