给出类似的源文本
nin2 hao3 ma
Run Code Online (Sandbox Code Playgroud)
(这是编写ASCII拼音的典型方法,没有适当强调的字符)并给出(UTF8)转换表,如
a1;?
e1;?
i1;?
o1;?
u1;?
ü1;?
A1;?
E1;?
...
Run Code Online (Sandbox Code Playgroud)
我将如何将源文本转换为
nín h?o ma
Run Code Online (Sandbox Code Playgroud)
?
为什么它值得我使用PHP,这可能是我正在研究的正则表达式?
Bou*_*egh 11
奥利的算法是一个很好的开始,但它没有正确应用标记.例如,qiao1变成qīāō.这个是正确和完整的.您可以轻松查看如何定义替换规则.
虽然除了删除数字之外它不影响输出,但它也为音调5做了全部工作.我把它留了下来,以防你想用音调5做些什么.
算法的工作原理如下:
例:
qiao => (iao becomes ia*o) => qia*o => qi?o
这种策略,以及strtr(优先考虑更长时间的替换)的使用,确保不会发生这种情况:
qiao1 =>qīāō
function pinyin_addaccents($string) {
# Find words with a number behind them, and replace with callback fn.
return preg_replace_callback(
'~([a-zA-ZüÜ]+)(\d)~',
'pinyin_addaccents_cb',
$string);
}
# Helper callback
function pinyin_addaccents_cb($match) {
static $accentmap = null;
if( $accentmap === null ) {
# Where to place the accent marks
$stars =
'a* e* i* o* u* ü* '.
'A* E* I* O* U* Ü* '.
'a*i a*o e*i ia* ia*o ie* io* iu* '.
'A*I A*O E*I IA* IA*O IE* IO* IU* '.
'o*u ua* ua*i ue* ui* uo* üe* '.
'O*U UA* UA*I UE* UI* UO* ÜE*';
$nostars = str_replace('*', '', $stars);
# Build an array like Array('a' => 'a*') and store statically
$accentmap = array_combine(explode(' ',$nostars), explode(' ', $stars));
unset($stars, $nostars);
}
static $vowels =
Array('a*','e*','i*','o*','u*','ü*','A*','E*','I*','O*','U*','Ü*');
static $pinyin = Array(
1 => Array('?','?','?','?','?','?','?','?','?','?','?','?'),
2 => Array('á','é','í','ó','ú','?','Á','É','Í','Ó','Ú','?'),
3 => Array('?','?','?','?','?','?','?','?','?','?','?','?'),
4 => Array('à','è','ì','ò','ù','?','À','È','Ì','Ò','Ù','?'),
5 => Array('a','e','i','o','u','ü','A','E','I','O','U','Ü')
);
list(,$word,$tone) = $match;
# Add star to vowelcluster
$word = strtr($word, $accentmap);
# Replace starred letter with accented
$word = str_replace($vowels, $pinyin[$tone], $word);
return $word;
}
Run Code Online (Sandbox Code Playgroud)
<?php\n$in = 'nin2 hao3 ma';\n$out = 'n\xc3\xadn h\xc7\x8eo ma';\n\nfunction replacer($match) {\n static $trTable = array(\n 1 => array(\n 'a' => '\xc4\x81',\n 'e' => '\xc4\x93',\n 'i' => '\xc4\xab',\n 'o' => '\xc5\x8d',\n 'u' => '\xc5\xab',\n '\xc3\xbc' => '\xc7\x96',\n 'A' => '\xc4\x80',\n 'E' => '\xc4\x92'),\n 2 => array('i' => '\xc3\xad'),\n 3 => array('a' => '\xc7\x8e')\n );\n list(, $word, $i) = $match;\n return str_replace(\n array_keys($trTable[$i]),\n array_values($trTable[$i]),\n $word); }\n\n// Outputs: bool(true)\nvar_dump(preg_replace_callback('~(\\w+)(\\d+)~', 'replacer', $in) === $out);\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
1553 次 |
| 最近记录: |