它比一个简单的正则表达式更复杂,例如,
"Hi, how are you?" ? "Hubi, hubow ubare yubou?"
Run Code Online (Sandbox Code Playgroud)
简单的正则表达式不会捕获e不发音的内容are.
您需要一个提供发音词典的库,例如nltk.corpus.cmudict:
from nltk.corpus import cmudict # $ pip install nltk
# $ python -c "import nltk; nltk.download('cmudict')"
def spubeak(word, pronunciations=cmudict.dict()):
istitle = word.istitle() # remember, to preserve titlecase
w = word.lower() #note: ignore Unicode case-folding
for syllables in pronunciations.get(w, []):
parts = []
for syl in syllables:
if syl[:1] == syl[1:2]:
syl = syl[1:] # remove duplicate
isvowel = syl[-1].isdigit()
# pronounce the word
parts.append('ub'+syl[:-1] if isvowel else syl)
result = ''.join(map(str.lower, parts))
return result.title() if istitle else result
return word # word not found in the dictionary
Run Code Online (Sandbox Code Playgroud)
例:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
sent = "Hi, how are you?"
subent = " ".join(["".join(map(spubeak, re.split("(\W+)", nonblank)))
for nonblank in sent.split()])
print('"{}" ? "{}"'.format(sent, subent))
Run Code Online (Sandbox Code Playgroud)
"Hi, how are you?" ? "Hubay, hubaw ubar yubuw?"
注意:它与第一个示例不同:每个单词都替换为其音节.
| 归档时间: |
|
| 查看次数: |
1193 次 |
| 最近记录: |