Python NLTK的命令会吐出已识别单词的音素。例如'see'-> [u'S',u'IY1'],但是对于无法识别的单词会给出错误。例如'seasee'->错误。
import nltk
arpabet = nltk.corpus.cmudict.dict()
for word in ('s', 'see', 'sea', 'compute', 'comput', 'seesea'):
try:
print arpabet[word][0]
except Exception as e:
print e
#Output
[u'EH1', u'S']
[u'S', u'IY1']
[u'S', u'IY1']
[u'K', u'AH0', u'M', u'P', u'Y', u'UW1', u'T']
'comput'
'seesea'
Run Code Online (Sandbox Code Playgroud)
是否有没有那个限制但能够找到/猜测任何真实或虚构单词的音素的模块?
如果没有,我有什么办法可以对其编程?我正在考虑做循环以测试单词的递增部分。例如,在“ seasee”中,第一个循环使用“ s”,下一个循环使用“ se”,第三个循环使用“ sea” ...等等,并运行命令。尽管问题是我不知道该如何发信号,但这是需要考虑的正确音素。例如,“ seasee”中的“ s”和“ sea”都将输出一些有效音素。
工作进程:
import nltk
arpabet = nltk.corpus.cmudict.dict()
for word in ('s', 'see', 'sea', 'compute', 'comput', 'seesea', 'darfasasawwa'):
try:
phone = arpabet[word][0]
except:
try:
counter = 0
for i in word:
substring = word[0:1+counter]
counter += 1
try:
print substring, arpabet[substring][0]
except Exception as e:
print e
except Exception as e:
print e
#Output
c [u'S', u'IY1']
co [u'K', u'OW1']
com [u'K', u'AA1', u'M']
comp [u'K', u'AA1', u'M', u'P']
compu [u'K', u'AA1', u'M', u'P', u'Y', u'UW0']
comput 'comput'
s [u'EH1', u'S']
se [u'S', u'AW2', u'TH', u'IY1', u'S', u'T']
see [u'S', u'IY1']
sees [u'S', u'IY1', u'Z']
seese [u'S', u'IY1', u'Z']
seesea 'seesea'
d [u'D', u'IY1']
da [u'D', u'AA1']
dar [u'D', u'AA1', u'R']
darf 'darf'
darfa 'darfa'
darfas 'darfas'
darfasa 'darfasa'
darfasas 'darfasas'
darfasasa 'darfasasa'
darfasasaw 'darfasasaw'
darfasasaww 'darfasasaww'
darfasasawwa 'darfasasawwa'
Run Code Online (Sandbox Code Playgroud)
我遇到了同样的问题,我通过递归分区来解决它(参见wordbreak)
import nltk
from functools import lru_cache
from itertools import product as iterprod
try:
arpabet = nltk.corpus.cmudict.dict()
except LookupError:
nltk.download('cmudict')
arpabet = nltk.corpus.cmudict.dict()
@lru_cache()
def wordbreak(s):
s = s.lower()
if s in arpabet:
return arpabet[s]
middle = len(s)/2
partition = sorted(list(range(len(s))), key=lambda x: (x-middle)**2-x)
for i in partition:
pre, suf = (s[:i], s[i:])
if pre in arpabet and wordbreak(suf) is not None:
return [x+y for x,y in iterprod(arpabet[pre], wordbreak(suf))]
return None
Run Code Online (Sandbox Code Playgroud)
您可以使用LOGIOS 词典工具。这是您的示例的输出:
S EH S
SEE S IY
SEA S IY
COMPUTE K AH M P Y UW T
COMPUT K AH M P UH T
SEESEA S IY S IY
Run Code Online (Sandbox Code Playgroud)
我不知道有任何Python实现,你可以尝试自己实现,或者使用调用perl代码subprocess.call
| 归档时间: |
|
| 查看次数: |
5025 次 |
| 最近记录: |