将编号转换为强调拼音?

Phi*_*sen 6 php regex cjk

给出类似的源文本

nin2 hao3 ma
Run Code Online (Sandbox Code Playgroud)

(这是编写ASCII拼音的典型方法,没有适当强调的字符)并给出(UTF8)转换表,如

a1;?
e1;?
i1;?
o1;?
u1;?
ü1;?
A1;?
E1;?
...
Run Code Online (Sandbox Code Playgroud)

我将如何将源文本转换为

nín h?o ma
Run Code Online (Sandbox Code Playgroud)

为什么它值得我使用PHP,这可能是我正在研究的正则表达式?

Bou*_*egh 11

奥利的算法是一个很好的开始,但它没有正确应用标记.例如,qiao1变成qīāō.这个是正确和完整的.您可以轻松查看如何定义替换规则.

虽然除了删除数字之外它不影响输出,但它也为音调5做了全部工作.我把它留了下来,以防你想用音调5做些什么.

算法的工作原理如下:

  • 单词和音调在$ match [1]和[2]中提供
  • 在应该得到重音标记的字母后面加一颗星
  • 带有星号的字母将被带有正确音调标记的字母替换.

例:

qiao => (iao becomes ia*o) => qia*o => qi?o

这种策略,以及strtr(优先考虑更长时间的替换)的使用,确保不会发生这种情况:

qiao1 =>qīāō


function pinyin_addaccents($string) {
    # Find words with a number behind them, and replace with callback fn.
    return preg_replace_callback(
        '~([a-zA-ZüÜ]+)(\d)~',
        'pinyin_addaccents_cb',
        $string);
}

# Helper callback
function pinyin_addaccents_cb($match) {
    static $accentmap = null;

    if( $accentmap === null ) {
        # Where to place the accent marks
        $stars =
            'a* e* i* o* u* ü* '.
            'A* E* I* O* U* Ü* '.
            'a*i a*o e*i ia* ia*o ie* io* iu* '.
            'A*I A*O E*I IA* IA*O IE* IO* IU* '.
            'o*u ua* ua*i ue* ui* uo* üe* '.
            'O*U UA* UA*I UE* UI* UO* ÜE*';
        $nostars = str_replace('*', '', $stars);

        # Build an array like Array('a' => 'a*') and store statically
        $accentmap = array_combine(explode(' ',$nostars), explode(' ', $stars));
        unset($stars, $nostars);
    }

    static $vowels =
        Array('a*','e*','i*','o*','u*','ü*','A*','E*','I*','O*','U*','Ü*');

    static $pinyin = Array(
        1 => Array('?','?','?','?','?','?','?','?','?','?','?','?'),
        2 => Array('á','é','í','ó','ú','?','Á','É','Í','Ó','Ú','?'),
        3 => Array('?','?','?','?','?','?','?','?','?','?','?','?'),
        4 => Array('à','è','ì','ò','ù','?','À','È','Ì','Ò','Ù','?'),
        5 => Array('a','e','i','o','u','ü','A','E','I','O','U','Ü')
    );

    list(,$word,$tone) = $match;
    # Add star to vowelcluster
    $word = strtr($word, $accentmap);
    # Replace starred letter with accented 
    $word = str_replace($vowels, $pinyin[$tone], $word);
    return $word;
}
Run Code Online (Sandbox Code Playgroud)

  • 刚刚将这个想法移植到javascript并在这里开源一个插件,如果有人感兴趣的话:https://github.com/quizlet/pinyin-converter (2认同)

Oll*_*ers 1

<?php\n$in = 'nin2 hao3 ma';\n$out = 'n\xc3\xadn h\xc7\x8eo ma';\n\nfunction replacer($match) {\n  static $trTable = array(\n    1 => array(\n      'a' => '\xc4\x81',\n      'e' => '\xc4\x93',\n      'i' => '\xc4\xab',\n      'o' => '\xc5\x8d',\n      'u' => '\xc5\xab',\n      '\xc3\xbc' => '\xc7\x96',\n      'A' => '\xc4\x80',\n      'E' => '\xc4\x92'),\n    2 => array('i' => '\xc3\xad'),\n    3 => array('a' => '\xc7\x8e')\n  );\n  list(, $word, $i) = $match;\n  return str_replace(\n    array_keys($trTable[$i]),\n    array_values($trTable[$i]),\n    $word); }\n\n// Outputs: bool(true)\nvar_dump(preg_replace_callback('~(\\w+)(\\d+)~', 'replacer', $in) === $out);\n
Run Code Online (Sandbox Code Playgroud)\n