Liz*_*ard 74 php string preg-replace non-ascii-characters
我试图用正常替换替换重音字符.以下是我目前正在做的事情.
$string = "Éric Cantona";
$strict = strtolower($string);
echo "After Lower: ".$strict;
$patterns[0] = '/[á|â|à|å|ä]/';
$patterns[1] = '/[ð|é|ê|è|ë]/';
$patterns[2] = '/[í|î|ì|ï]/';
$patterns[3] = '/[ó|ô|ò|ø|õ|ö]/';
$patterns[4] = '/[ú|û|ù|ü]/';
$patterns[5] = '/æ/';
$patterns[6] = '/ç/';
$patterns[7] = '/ß/';
$replacements[0] = 'a';
$replacements[1] = 'e';
$replacements[2] = 'i';
$replacements[3] = 'o';
$replacements[4] = 'u';
$replacements[5] = 'ae';
$replacements[6] = 'c';
$replacements[7] = 'ss';
$strict = preg_replace($patterns, $replacements, $strict);
echo "Final: ".$strict;
Run Code Online (Sandbox Code Playgroud)
这给了我:
After Lower: éric cantona
Final: ric cantona
Run Code Online (Sandbox Code Playgroud)
上面给了我ric cantona我想要的输出eric cantona.
任何人都可以帮我解决我的错误吗?
Liz*_*ard 153
我根据答案中列出的变化尝试了所有种类,但以下工作:
$unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );
Run Code Online (Sandbox Code Playgroud)
mvd*_*vds 75
要删除变音符号,请使用iconv:
$val = iconv('ISO-8859-1','ASCII//TRANSLIT',$val);
Run Code Online (Sandbox Code Playgroud)
要么
$val = iconv('UTF-8','ASCII//TRANSLIT',$val);
Run Code Online (Sandbox Code Playgroud)
请注意,php有一些奇怪的错误,因为它(有时?)需要设置一个语言环境来使这些转换工作,使用setlocale().
编辑刚测试,它开箱即用你的大部分变音符号:
$val = "á|â|à|å|ä ð|é|ê|è|ë í|î|ì|ï ó|ô|ò|ø|õ|ö ú|û|ù|ü æ ç ß abc ABC 123";
echo iconv('UTF-8','ASCII//TRANSLIT',$val);
Run Code Online (Sandbox Code Playgroud)
输出:
a|a|a|a|a ?|e|e|e|e i|i|i|i o|o|o|?|o|o u|u|u|u ae c ss abc ABC 123
Run Code Online (Sandbox Code Playgroud)
所以你可能想在调用iconv之前手工修复那两个奇怪的东西,或深入研究php的内部工作并实际修复它.
Bur*_*Leo 39
我刚刚得到了Lizard的答案,这非常有帮助 - 尤其是当你做一些排序时.是不是很漂亮我们需要说多少个字符大致相同;)
如果其他人在寻找全面的解决方案(就上面的评论而言),这里是复制和粘贴:
/**
* Replace language-specific characters by ASCII-equivalents.
* @param string $s
* @return string
*/
public static function normalizeChars($s) {
$replace = array(
'?'=>'-', '?'=>'-', '?'=>'-', '?'=>'-',
'?'=>'A', '?'=>'A', 'À'=>'A', 'Ã'=>'A', 'Á'=>'A', 'Æ'=>'A', 'Â'=>'A', 'Å'=>'A', 'Ä'=>'Ae',
'Þ'=>'B',
'?'=>'C', '?'=>'C', 'Ç'=>'C',
'È'=>'E', '?'=>'E', 'É'=>'E', 'Ë'=>'E', 'Ê'=>'E',
'?'=>'G',
'?'=>'I', 'Ï'=>'I', 'Î'=>'I', 'Í'=>'I', 'Ì'=>'I',
'?'=>'L',
'Ñ'=>'N', '?'=>'N',
'Ø'=>'O', 'Ó'=>'O', 'Ò'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'Oe',
'?'=>'S', '?'=>'S', '?'=>'S', 'Š'=>'S',
'?'=>'T',
'Ù'=>'U', 'Û'=>'U', 'Ú'=>'U', 'Ü'=>'Ue',
'Ý'=>'Y',
'?'=>'Z', 'Ž'=>'Z', '?'=>'Z',
'â'=>'a', '?'=>'a', '?'=>'a', 'á'=>'a', '?'=>'a', 'ã'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', 'å'=>'a', 'à'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', 'ä'=>'ae', 'æ'=>'ae', '?'=>'ae', '?'=>'ae',
'?'=>'b', '?'=>'b', '?'=>'b', 'þ'=>'b',
'?'=>'c', '?'=>'c', '?'=>'c', '?'=>'c', 'ç'=>'c', '?'=>'c', '?'=>'c', '?'=>'c', '?'=>'c', '?'=>'c', '?'=>'c', '?'=>'ch', '?'=>'ch',
'?'=>'d', '?'=>'d', '?'=>'d', '?'=>'d', '?'=>'d', '?'=>'d', '?'=>'D', 'ð'=>'d',
'?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', 'ê'=>'e', '?'=>'e', 'è'=>'e', 'ë'=>'e', 'é'=>'e',
'?'=>'f', 'ƒ'=>'f', '?'=>'f',
'?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g',
'?'=>'h', '?'=>'h', '?'=>'h', '?'=>'h', '?'=>'h', '?'=>'h', '?'=>'h', '?'=>'h',
'î'=>'i', 'ï'=>'i', 'í'=>'i', 'ì'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'ij', '?'=>'ij',
'?'=>'j', '?'=>'j', '?'=>'j', '?'=>'j', '?'=>'ja', '?'=>'ja', '?'=>'je', '?'=>'je', '?'=>'jo', '?'=>'jo', '?'=>'ju', '?'=>'ju',
'?'=>'k', '?'=>'k', '?'=>'k', '?'=>'k', '?'=>'k', '?'=>'k', '?'=>'k',
'?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l',
'?'=>'m', '?'=>'m', '?'=>'m', '?'=>'m',
'ñ'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n',
'?'=>'o', '?'=>'o', '?'=>'o', 'õ'=>'o', 'ô'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', 'ø'=>'o', '?'=>'o', '?'=>'o', 'ò'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', 'ó'=>'o', '?'=>'o', 'œ'=>'oe', 'Œ'=>'oe', 'ö'=>'oe',
'?'=>'p', '?'=>'p', '?'=>'p', '?'=>'p',
'?'=>'q',
'?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r',
'?'=>'s', '?'=>'s', '?'=>'s', 'š'=>'s', '?'=>'s', '?'=>'s', '?'=>'s', '?'=>'s', '?'=>'s', '?'=>'sch', '?'=>'sch', '?'=>'sh', '?'=>'sh', 'ß'=>'ss',
'?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '™'=>'tm',
'?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', 'ü'=>'ue',
'?'=>'v', '?'=>'v', '?'=>'v',
'?'=>'w', '?'=>'w', '?'=>'w',
'?'=>'y', '?'=>'y', 'ý'=>'y', 'ÿ'=>'y', 'Ÿ'=>'y', '?'=>'y',
'?'=>'y', 'ž'=>'z', '?'=>'z', '?'=>'z', '?'=>'z', '?'=>'z', '?'=>'z', '?'=>'z', '?'=>'zh', '?'=>'zh'
);
return strtr($s, $replace);
}
Run Code Online (Sandbox Code Playgroud)
注意德国变音符号的一些细微变化(ä=> ae)
编辑:包含更多基于user3682119发布的字符(版权符号除外)和来自daker的评论.
Ita*_*Ale 26
在PHP 5.4中,intl扩展提供了一个名为Transliterator的新类.
我认为这是删除变音符号的最佳方法,原因有两个:
Transliterator基于ICU,因此您正在使用ICU库的表格.ICU是一个伟大的项目,在过去一年中开发,提供全面的表格和功能.无论你想自己写什么表,它都不会像ICU那样完整.
在UTF-8中,字符可以用不同的方式表示.例如,字符ñ可以保存为单个(多字节)字符,也可以保存为字符组合˜(多字节)和n.除此之外,Unicode中的一些字符是同形异义词:它们看起来相同但具有不同的代码点.因此,将字符串规范化也很重要.
这是一个示例代码,取自我的一个旧答案:
<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
$normalized = $transliterator->transliterate($e);
echo $e. ' --> '.$normalized."\n";
}
?>
Run Code Online (Sandbox Code Playgroud)
结果:
abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto
Run Code Online (Sandbox Code Playgroud)
Transliterator类的第一个参数执行变音符号的删除以及字符串的规范化.
小智 11
所以我在php.net页面上找到了preg_replace函数
// replace accented chars
$string = "Zacarías Ferreíra"; // my definition for string variable
$accents = '/&([A-Za-z]{1,2})(grave|acute|circ|cedil|uml|lig);/';
$string_encoded = htmlentities($string,ENT_NOQUOTES,'UTF-8');
$string = preg_replace($accents,'$1',$string_encoded);
Run Code Online (Sandbox Code Playgroud)
如果你有编码问题,你可能会得到像这样的"ZacarÃÂasFerreÃÂra",只需解码字符串并使用上面的代码
$string = utf8_decode("ZacarÃÂas FerreÃÂra");
Run Code Online (Sandbox Code Playgroud)
小智 9
这对我有用:
<?php
setlocale(LC_ALL, "en_US.utf8");
$val = iconv('UTF-8','ASCII//TRANSLIT',$val);
?>
Run Code Online (Sandbox Code Playgroud)
我发现这种方式很好,无需过多担心字符集和数组或 iconv:
function replace_accents($str) {
$str = htmlentities($str, ENT_COMPAT, "UTF-8");
$str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde|ring);/','$1',$str);
return html_entity_decode($str);
}
Run Code Online (Sandbox Code Playgroud)
小智 8
protected $_convertTable = array(
'&' => 'and', '@' => 'at', '©' => 'c', '®' => 'r', 'À' => 'a',
'Á' => 'a', 'Â' => 'a', 'Ä' => 'a', 'Å' => 'a', 'Æ' => 'ae','Ç' => 'c',
'È' => 'e', 'É' => 'e', 'Ë' => 'e', 'Ì' => 'i', 'Í' => 'i', 'Î' => 'i',
'Ï' => 'i', 'Ò' => 'o', 'Ó' => 'o', 'Ô' => 'o', 'Õ' => 'o', 'Ö' => 'o',
'Ø' => 'o', 'Ù' => 'u', 'Ú' => 'u', 'Û' => 'u', 'Ü' => 'u', 'Ý' => 'y',
'ß' => 'ss','à' => 'a', 'á' => 'a', 'â' => 'a', 'ä' => 'a', 'å' => 'a',
'æ' => 'ae','ç' => 'c', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e',
'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ò' => 'o', 'ó' => 'o',
'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ø' => 'o', 'ù' => 'u', 'ú' => 'u',
'û' => 'u', 'ü' => 'u', 'ý' => 'y', 'þ' => 'p', 'ÿ' => 'y', '?' => 'a',
'?' => 'a', '?' => 'a', '?' => 'a', '?' => 'a', '?' => 'a', '?' => 'c',
'?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c',
'?' => 'c', '?' => 'd', '?' => 'd', '?' => 'd', '?' => 'd', '?' => 'e',
'?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e',
'?' => 'e', '?' => 'e', '?' => 'e', '?' => 'g', '?' => 'g', '?' => 'g',
'?' => 'g', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'h',
'?' => 'h', '?' => 'h', '?' => 'h', '?' => 'i', '?' => 'i', '?' => 'i',
'?' => 'i', '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'i',
'?' => 'i', '?' => 'ij','?' => 'ij','?' => 'j', '?' => 'j', '?' => 'k',
'?' => 'k', '?' => 'k', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l',
'?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l',
'?' => 'n', '?' => 'n', '?' => 'n', '?' => 'n', '?' => 'n', '?' => 'n',
'?' => 'n', '?' => 'n', '?' => 'n', '?' => 'o', '?' => 'o', '?' => 'o',
'?' => 'o', '?' => 'o', '?' => 'o', 'Œ' => 'oe','œ' => 'oe','?' => 'r',
'?' => 'r', '?' => 'r', '?' => 'r', '?' => 'r', '?' => 'r', '?' => 's',
'?' => 's', '?' => 's', '?' => 's', '?' => 's', '?' => 's', 'Š' => 's',
'š' => 's', '?' => 't', '?' => 't', '?' => 't', '?' => 't', '?' => 't',
'?' => 't', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u',
'?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u',
'?' => 'u', '?' => 'w', '?' => 'w', '?' => 'y', '?' => 'y', 'Ÿ' => 'y',
'?' => 'z', '?' => 'z', '?' => 'z', '?' => 'z', 'Ž' => 'z', 'ž' => 'z',
'?' => 'z', '?' => 'e', 'ƒ' => 'f', '?' => 'o', '?' => 'o', '?' => 'u',
'?' => 'u', '?' => 'a', '?' => 'a', '?' => 'i', '?' => 'i', '?' => 'o',
'?' => 'o', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u',
'?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'a',
'?' => 'a', '?' => 'ae','?' => 'ae','?' => 'o', '?' => 'o', '?' => 'e',
'?' => 'jo','?' => 'e', '?' => 'i', '?' => 'i', '?' => 'a', '?' => 'b',
'?' => 'v', '?' => 'g', '?' => 'd', '?' => 'e', '?' => 'zh','?' => 'z',
'?' => 'i', '?' => 'j', '?' => 'k', '?' => 'l', '?' => 'm', '?' => 'n',
'?' => 'o', '?' => 'p', '?' => 'r', '?' => 's', '?' => 't', '?' => 'u',
'?' => 'f', '?' => 'h', '?' => 'c', '?' => 'ch','?' => 'sh','?' => 'sch',
'?' => '-', '?' => 'y', '?' => '-', '?' => 'je','?' => 'ju','?' => 'ja',
'?' => 'a', '?' => 'b', '?' => 'v', '?' => 'g', '?' => 'd', '?' => 'e',
'?' => 'zh','?' => 'z', '?' => 'i', '?' => 'j', '?' => 'k', '?' => 'l',
'?' => 'm', '?' => 'n', '?' => 'o', '?' => 'p', '?' => 'r', '?' => 's',
'?' => 't', '?' => 'u', '?' => 'f', '?' => 'h', '?' => 'c', '?' => 'ch',
'?' => 'sh','?' => 'sch','?' => '-','?' => 'y', '?' => '-', '?' => 'je',
'?' => 'ju','?' => 'ja','?' => 'jo','?' => 'e', '?' => 'i', '?' => 'i',
'?' => 'g', '?' => 'g', '?' => 'a', '?' => 'b', '?' => 'g', '?' => 'd',
'?' => 'h', '?' => 'v', '?' => 'z', '?' => 'h', '?' => 't', '?' => 'i',
'?' => 'k', '?' => 'k', '?' => 'l', '?' => 'm', '?' => 'm', '?' => 'n',
'?' => 'n', '?' => 's', '?' => 'e', '?' => 'p', '?' => 'p', '?' => 'C',
'?' => 'c', '?' => 'q', '?' => 'r', '?' => 'w', '?' => 't', '™' => 'tm',
);
Run Code Online (Sandbox Code Playgroud)
从magento,我基本上使用它一切
基于@BurninLeo答案的更新答案
function replace_spec_char($subject) {
$char_map = array(
"?" => "-", "?" => "-", "?" => "-", "?" => "-",
"?" => "A", "?" => "A", "?" => "A", "?" => "A", "À" => "A", "Ã" => "A", "Á" => "A", "Æ" => "A", "Â" => "A", "Å" => "A", "?" => "A", "?" => "A", "?" => "A",
"?" => "B", "?" => "B", "Þ" => "B",
"?" => "C", "?" => "C", "Ç" => "C", "?" => "C", "?" => "C", "?" => "C", "?" => "C", "©" => "C", "?" => "C",
"?" => "D", "?" => "D", "?" => "D", "?" => "D", "Ð" => "D",
"È" => "E", "?" => "E", "É" => "E", "Ë" => "E", "Ê" => "E", "?" => "E", "?" => "E", "?" => "E", "?" => "E", "?" => "E", "?" => "E", "?" => "E", "?" => "E",
"?" => "F", "?" => "F",
"?" => "G", "?" => "G", "?" => "G", "?" => "G", "?" => "G", "?" => "G", "?" => "G",
"?" => "H", "?" => "H", "?" => "H", "?" => "H", "?" => "H",
"I" => "I", "Ï" => "I", "Î" => "I", "Í" => "I", "Ì" => "I", "?" => "I", "?" => "I", "I" => "I", "?" => "I", "?" => "I", "?" => "I", "?" => "I", "?" => "I", "?" => "I", "?" => "I",
"?" => "J", "?" => "J",
"?" => "K", "?" => "K", "?" => "K", "?" => "K", "?" => "K",
"?" => "L", "?" => "L", "?" => "L", "?" => "L", "?" => "L", "?" => "L", "?" => "L",
"?" => "M", "?" => "M", "?" => "M",
"Ñ" => "N", "?" => "N", "?" => "N", "?" => "N", "?" => "N", "?" => "N", "?" => "N", "?" => "N", "?" => "N",
"Ø" => "O", "Ó" => "O", "Ò" => "O", "Ô" => "O", "Õ" => "O", "?" => "O", "?" => "O", "?" => "O", "?" => "O", "?" => "O", "?" => "O", "?" => "O",
"?" => "P", "?" => "P", "?" => "P",
"?" => "Q",
"?" => "R", "?" => "R", "?" => "R", "?" => "R", "?" => "R", "®" => "R",
"?" => "S", "?" => "S", "?" => "S", "Š" => "S", "?" => "S", "?" => "S", "?" => "S",
"?" => "T", "?" => "T", "?" => "T", "?" => "T", "?" => "T", "?" => "T", "?" => "T",
"Ù" => "U", "Û" => "U", "Ú" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U",
"?" => "V", "?" => "V",
"Ý" => "Y", "?" => "Y", "?" => "Y", "Ÿ" => "Y",
"?" => "Z", "Ž" => "Z", "?" => "Z", "?" => "Z", "?" => "Z",
"?" => "a", "?" => "a", "?" => "a", "?" => "a", "à" => "a", "ã" => "a", "á" => "a", "æ" => "a", "â" => "a", "å" => "a", "?" => "a", "?" => "a", "?" => "a",
"?" => "b", "?" => "b", "þ" => "b",
"?" => "c", "?" => "c", "ç" => "c", "?" => "c", "?" => "c", "?" => "c", "?" => "c", "©" => "c", "?" => "c",
"?" => "ch", "?" => "ch",
"?" => "d", "?" => "d", "?" => "d", "?" => "d", "ð" => "d",
"è" => "e", "?" => "e", "é" => "e", "ë" => "e", "ê" => "e", "?" => "e", "?" => "e", "?" => "e", "?" => "e", "?" => "e", "?" => "e", "?" => "e", "?" => "e",
"?" => "f", "ƒ" => "f",
"?" => "g", "?" => "g", "?" => "g", "?" => "g", "?" => "g", "?" => "g", "?" => "g",
"?" => "h", "?" => "h", "?" => "h", "?" => "h", "?" => "h",
"i" => "i", "ï" => "i", "î" => "i", "í" => "i", "ì" => "i", "?" => "i", "?" => "i", "?" => "i", "?" => "i", "?" => "i", "?" => "i", "?" => "i", "?" => "i", "?" => "i", "?" => "i",
"?" => "j", "?" => "j", "?" => "j", "?" => "j",
"?" => "k", "?" => "k", "?" => "k", "?" => "k", "?" => "k",
"?" => "l", "?" => "l", "?" => "l", "?" => "l", "?" => "l", "?" => "l", "?" => "l",
"?" => "m", "?" => "m", "?" => "m",
"ñ" => "n", "?" => "n", "?" => "n", "?" => "n", "?" => "n", "?" => "n", "?" => "n", "?" => "n", "?" => "n",
"ø" => "o", "ó" => "o", "ò" => "o", "ô" => "o", "õ" => "o", "?" => "o", "?" => "o", "?" => "o", "?" => "o", "?" => "o", "?" => "o", "?" => "o",
"?" => "p", "?" => "p", "?" => "p",
"?" => "q",
"?" => "r", "?" => "r", "?" => "r", "?" => "r", "?" => "r", "®" => "r",
"?" => "s", "?" => "s", "?" => "s", "š" => "s", "?" => "s", "?" => "s", "?" => "s",
"?" => "t", "?" => "t", "?" => "t", "?" => "t", "?" => "t", "?" => "t", "?" => "t",
"ù" => "u", "û" => "u", "ú" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u",
"?" => "v", "?" => "v",
"ý" => "y", "?" => "y", "?" => "y", "ÿ" => "y",
"?" => "z", "ž" => "z", "?" => "z", "?" => "z", "?" => "z", "?" => "z",
"™" => "tm",
"@" => "at",
"Ä" => "ae", "?" => "ae", "ä" => "ae", "æ" => "ae", "?" => "ae",
"?" => "ij", "?" => "ij",
"?" => "ja", "?" => "ja",
"?" => "je", "?" => "je",
"?" => "jo", "?" => "jo",
"?" => "ju", "?" => "ju",
"œ" => "oe", "Œ" => "oe", "ö" => "oe", "Ö" => "oe",
"?" => "sch", "?" => "sch",
"?" => "sh", "?" => "sh",
"ß" => "ss",
"Ü" => "ue",
"?" => "zh", "?" => "zh",
);
return strtr($subject, $char_map);
}
$string = "?í ???®ë, ?ß? å test!";
echo replace_spec_char($string);
Run Code Online (Sandbox Code Playgroud)
?í ???®ë, ?ß? å test! =>
Hi there, jusst a test!
除了更长的字符(例如:ss,ch,sch)之外,这不会混淆大写和小写字符,添加@®©
此外,如果您想构建正则表达式匹配而不管特殊字符:
rss => '[r?????????](?:[s???š?????][s???š?????]|[ß])'
这个的vala实现:https://code.launchpad.net/~jeremy-munsch/synapse-project/ascii-smart/+merge/277477
这是您可以使用的基本列表,使用正则表达式替换(在崇高文本中)或小脚本,您可以从此数组构建任何内容以满足您的需求.
"-" => "????",
"A" => "????ÀÃÁÆÂÅ???",
"B" => "??Þ",
"C" => "??Ç????©?",
"D" => "????Ð",
"E" => "È?ÉËÊ????????",
"F" => "??",
"G" => "???????",
"H" => "?????",
"I" => "IÏÎÍÌ??I???????",
"J" => "??",
"K" => "?????",
"L" => "???????",
"M" => "???",
"N" => "Ñ????????",
"O" => "ØÓÒÔÕ???????",
"P" => "???",
"Q" => "?",
"R" => "?????®",
"S" => "???Š???",
"T" => "???????",
"U" => "ÙÛÚ?????????????",
"V" => "??",
"Y" => "Ý??Ÿ",
"Z" => "?Ž???",
"a" => "????àãáæâå???",
"b" => "??þ",
"c" => "??ç????©?",
"ch" => "?",
"d" => "????ð",
"e" => "è?éëê????????",
"f" => "?ƒ",
"g" => "???????",
"h" => "?????",
"i" => "iïîíì??????????",
"j" => "??",
"k" => "?????",
"l" => "???????",
"m" => "???",
"n" => "ñ????????",
"o" => "øóòôõ???????",
"p" => "???",
"q" => "?",
"r" => "?????®",
"s" => "???š???",
"t" => "???????",
"u" => "ùûú?????????????",
"v" => "??",
"y" => "ý??ÿ",
"z" => "?ž????",
"tm" => "™",
"at" => "@",
"ae" => "Ä?äæ?",
"ch" => "??",
"ij" => "??",
"j" => "????",
"ja" => "??",
"je" => "??",
"jo" => "??",
"ju" => "??",
"oe" => "œŒöÖ",
"sch" => "??",
"sh" => "??",
"ss" => "ß",
"tm" => "™",
"ue" => "Ü",
"zh" => "??"
Run Code Online (Sandbox Code Playgroud)
小智 6
我已经搜索过了,你的重音条纹想法非常棒并且具有成本效益,但是你的正则表达式做得错误并且错过了 2 个额外的参数。长话短说,正则表达式必须是:
\n\n$patterns[0] = \'/[\xc3\xa1\xc3\xa2\xc3\xa0\xc3\xa5\xc3\xa4]/ui\';\n$patterns[1] = \'/[\xc3\xb0\xc3\xa9\xc3\xaa\xc3\xa8\xc3\xab]/ui\';\n$patterns[2] = \'/[\xc3\xad\xc3\xae\xc3\xac\xc3\xaf]/ui\';\n$patterns[3] = \'/[\xc3\xb3\xc3\xb4\xc3\xb2\xc3\xb8\xc3\xb5\xc3\xb6]/ui\';\n$patterns[4] = \'/[\xc3\xba\xc3\xbb\xc3\xb9\xc3\xbc]/ui\';\n$patterns[5] = \'/\xc3\xa6/ui\';\n$patterns[6] = \'/\xc3\xa7/ui\';\n$patterns[7] = \'/\xc3\x9f/ui\';\n$replacements[0] = \'a\';\n$replacements[1] = \'e\';\n$replacements[2] = \'i\';\n$replacements[3] = \'o\';\n$replacements[4] = \'u\';\n$replacements[5] = \'ae\';\n$replacements[6] = \'c\';\n$replacements[7] = \'ss\';\nRun Code Online (Sandbox Code Playgroud)\n\n正如您所看到的,非常相似,但最重要的是正则表达式第二个斜杠后面的参数。当正则表达式像这样/[someCoolRegex]/ui指定u它必须使用 unicode 并i指定不区分大小写时,我已经测试了自己的并且在这个论坛中的答案中我必须说比使用 strtr 更具成本效益。
希望有人读到这个答案。
\n免责声明:我不再支持这个答案了(当时我是个盲人).但感谢up-votes = P.
你可以把它作为基础.从WordPress,用于生成漂亮的URL(入口点是slugify()函数):
/**
* Converts all accent characters to ASCII characters.
*
* If there are no accent characters, then the string given is just returned.
*
* @param string $string Text that might have accent characters
* @return string Filtered string with replaced "nice" characters.
*/
function remove_accents($string) {
if (!preg_match('/[\x80-\xff]/', $string))
return $string;
if (seems_utf8($string)) {
$chars = array(
// Decompositions for Latin-1 Supplement
chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',
chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',
chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',
chr(195).chr(135) => 'C', chr(195).chr(136) => 'E',
chr(195).chr(137) => 'E', chr(195).chr(138) => 'E',
chr(195).chr(139) => 'E', chr(195).chr(140) => 'I',
chr(195).chr(141) => 'I', chr(195).chr(142) => 'I',
chr(195).chr(143) => 'I', chr(195).chr(145) => 'N',
chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',
chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',
chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',
chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',
chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',
chr(195).chr(159) => 's', chr(195).chr(160) => 'a',
chr(195).chr(161) => 'a', chr(195).chr(162) => 'a',
chr(195).chr(163) => 'a', chr(195).chr(164) => 'a',
chr(195).chr(165) => 'a', chr(195).chr(167) => 'c',
chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',
chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',
chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',
chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',
chr(195).chr(177) => 'n', chr(195).chr(178) => 'o',
chr(195).chr(179) => 'o', chr(195).chr(180) => 'o',
chr(195).chr(181) => 'o', chr(195).chr(182) => 'o',
chr(195).chr(182) => 'o', chr(195).chr(185) => 'u',
chr(195).chr(186) => 'u', chr(195).chr(187) => 'u',
chr(195).chr(188) => 'u', chr(195).chr(189) => 'y',
chr(195).chr(191) => 'y',
// Decompositions for Latin Extended-A
chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',
chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',
chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',
chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',
chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',
chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',
chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',
chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',
chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',
chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',
chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',
chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',
chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',
chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',
chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',
chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',
chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',
chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',
chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',
chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',
chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',
chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',
chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',
chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',
chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',
chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',
chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',
chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',
chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',
chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',
chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',
chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',
chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',
chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',
chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',
chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',
chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',
chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',
chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',
chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',
chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',
chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',
chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',
chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',
chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',
chr(197).chr(154) => 'S',chr(197).chr(155) => 's',
chr(197).chr(156) => 'S',chr(197).chr(157) => 's',
chr(197).chr(158) => 'S',chr(197).chr(159) => 's',
chr(197).chr(160) => 'S', chr(197).chr(161) => 's',
chr(197).chr(162) => 'T', chr(197).chr(163) => 't',
chr(197).chr(164) => 'T', chr(197).chr(165) => 't',
chr(197).chr(166) => 'T', chr(197).chr(167) => 't',
chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',
chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',
chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',
chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',
chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',
chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',
chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',
chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',
chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',
chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',
chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',
chr(197).chr(190) => 'z', chr(197).chr(191) => 's',
// Euro Sign
chr(226).chr(130).chr(172) => 'E',
// GBP (Pound) Sign
chr(194).chr(163) => '');
$string = strtr($string, $chars);
} else {
// Assume ISO-8859-1 if not UTF-8
$chars['in'] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158)
.chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194)
.chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202)
.chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210)
.chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218)
.chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227)
.chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235)
.chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243)
.chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251)
.chr(252).chr(253).chr(255);
$chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy";
$string = strtr($string, $chars['in'], $chars['out']);
$double_chars['in'] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254));
$double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th');
$string = str_replace($double_chars['in'], $double_chars['out'], $string);
}
return $string;
}
/**
* Checks to see if a string is utf8 encoded.
*
* @author bmorel at ssi dot fr
*
* @param string $Str The string to be checked
* @return bool True if $Str fits a UTF-8 model, false otherwise.
*/
function seems_utf8($Str) { # by bmorel at ssi dot fr
$length = strlen($Str);
for ($i = 0; $i < $length; $i++) {
if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n = 1; # 110bbbbb
elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n = 2; # 1110bbbb
elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n = 3; # 11110bbb
elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n = 4; # 111110bb
elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n = 5; # 1111110b
else return false; # Does not match any model
for ($j = 0; $j < $n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length) || ((ord($Str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
function utf8_uri_encode($utf8_string, $length = 0) {
$unicode = '';
$values = array();
$num_octets = 1;
$unicode_length = 0;
$string_length = strlen($utf8_string);
for ($i = 0; $i < $string_length; $i++) {
$value = ord($utf8_string[$i]);
if ($value < 128) {
if ($length && ($unicode_length >= $length))
break;
$unicode .= chr($value);
$unicode_length++;
} else {
if (count($values) == 0) $num_octets = ($value < 224) ? 2 : 3;
$values[] = $value;
if ($length && ($unicode_length + ($num_octets * 3)) > $length)
break;
if (count( $values ) == $num_octets) {
if ($num_octets == 3) {
$unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
$unicode_length += 9;
} else {
$unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
$unicode_length += 6;
}
$values = array();
$num_octets = 1;
}
}
}
return $unicode;
}
/**
* Sanitizes title, replacing whitespace with dashes.
*
* Limits the output to alphanumeric characters, underscore (_) and dash (-).
* Whitespace becomes a dash.
*
* @param string $title The title to be sanitized.
* @return string The sanitized title.
*/
function slugify($title) {
$title = strip_tags($title);
// Preserve escaped octets.
$title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
// Remove percent signs that are not part of an octet.
$title = str_replace('%', '', $title);
// Restore octets.
$title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
$title = remove_accents($title);
if (seems_utf8($title)) {
if (function_exists('mb_strtolower')) {
$title = mb_strtolower($title, 'UTF-8');
}
$title = utf8_uri_encode($title, 200);
}
$title = strtolower($title);
$title = preg_replace('/&.+?;/', '', $title); // kill entities
$title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
$title = preg_replace('/\s+/', '-', $title);
$title = preg_replace('|-+|', '-', $title);
$title = trim($title, '-');
return $title;
}
Run Code Online (Sandbox Code Playgroud)
如果您有http://php.net/manual/en/book.intl.php可用,这将解决您的问题:
$string = "Éric Cantona";
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);
Run Code Online (Sandbox Code Playgroud)
要在ubuntu中安装php扩展名:
apt-get install php-intl
Run Code Online (Sandbox Code Playgroud)
在作曲家中,请不要忘记要求扩展ext-intl以确保其正确适合已部署的系统。
| 归档时间: |
|
| 查看次数: |
139689 次 |
| 最近记录: |