从Python NLTK或其他模块中的任何单词获取音素？

Question

从Python NLTK或其他模块中的任何单词获取音素？

Python NLTK的命令会吐出已识别单词的音素。例如'see'-> [u'S'，u'IY1']，但是对于无法识别的单词会给出错误。例如'seasee'->错误。

import nltk

arpabet = nltk.corpus.cmudict.dict()

for word in ('s', 'see', 'sea', 'compute', 'comput', 'seesea'):
    try:
        print arpabet[word][0]
    except Exception as e:
        print e

#Output
[u'EH1', u'S']
[u'S', u'IY1']
[u'S', u'IY1']
[u'K', u'AH0', u'M', u'P', u'Y', u'UW1', u'T']
'comput'
'seesea'

Run Code Online (Sandbox Code Playgroud)

是否有没有那个限制但能够找到/猜测任何真实或虚构单词的音素的模块？

如果没有，我有什么办法可以对其编程？我正在考虑做循环以测试单词的递增部分。例如，在“ seasee”中，第一个循环使用“ s”，下一个循环使用“ se”，第三个循环使用“ sea” ...等等，并运行命令。尽管问题是我不知道该如何发信号，但这是需要考虑的正确音素。例如，“ seasee”中的“ s”和“ sea”都将输出一些有效音素。

工作进程：

import nltk

arpabet = nltk.corpus.cmudict.dict()

for word in ('s', 'see', 'sea', 'compute', 'comput', 'seesea', 'darfasasawwa'):
    try:
        phone = arpabet[word][0]
    except:
        try:
            counter = 0
            for i in word:
                substring = word[0:1+counter]
                counter += 1
                try:
                    print substring, arpabet[substring][0]
                except Exception as e:
                    print e
        except Exception as e:
            print e

#Output
c [u'S', u'IY1']
co [u'K', u'OW1']
com [u'K', u'AA1', u'M']
comp [u'K', u'AA1', u'M', u'P']
compu [u'K', u'AA1', u'M', u'P', u'Y', u'UW0']
comput 'comput'
s [u'EH1', u'S']
se [u'S', u'AW2', u'TH', u'IY1', u'S', u'T']
see [u'S', u'IY1']
sees [u'S', u'IY1', u'Z']
seese [u'S', u'IY1', u'Z']
seesea 'seesea'
d [u'D', u'IY1']
da [u'D', u'AA1']
dar [u'D', u'AA1', u'R']
darf 'darf'
darfa 'darfa'
darfas 'darfas'
darfasa 'darfasa'
darfasas 'darfasas'
darfasasa 'darfasasa'
darfasasaw 'darfasasaw'
darfasasaww 'darfasasaww'
darfasasawwa 'darfasasawwa'

Run Code Online (Sandbox Code Playgroud)

Answer 1

Uri*_*ren 6

我遇到了同样的问题，我通过递归分区来解决它（参见wordbreak）

import nltk
from functools import lru_cache
from itertools import product as iterprod

try:
    arpabet = nltk.corpus.cmudict.dict()
except LookupError:
    nltk.download('cmudict')
    arpabet = nltk.corpus.cmudict.dict()

@lru_cache()
def wordbreak(s):
    s = s.lower()
    if s in arpabet:
        return arpabet[s]
    middle = len(s)/2
    partition = sorted(list(range(len(s))), key=lambda x: (x-middle)**2-x)
    for i in partition:
        pre, suf = (s[:i], s[i:])
        if pre in arpabet and wordbreak(suf) is not None:
            return [x+y for x,y in iterprod(arpabet[pre], wordbreak(suf))]
    return None

Run Code Online (Sandbox Code Playgroud)

Answer 2

dim*_*mid 2

您可以使用LOGIOS 词典工具。这是您的示例的输出：

S   EH S
SEE S IY
SEA S IY
COMPUTE K AH M P Y UW T
COMPUT  K AH M P UH T
SEESEA  S IY S IY

Run Code Online (Sandbox Code Playgroud)

我不知道有任何Python实现，你可以尝试自己实现，或者使用调用perl代码subprocess.call

归档时间：	10 年，2 月前
查看次数：	5025 次
最近记录：	6 年，2 月前