Zor*_*kić 42 php decode utf-8 preg-replace
我有php文件signup.php,它将内容从表单(在form.php文件中)保存到MySQL基础.当我想重新格式化输入内容时出现问题.我想解码像à-> a这样的UTF-8字符.
$first_name=$_POST['first_name'];
$last_name=$_POST['last_name'];
$course=$_POST['course'];
$chain="prêt-à-porter";
$pattern = array("'é'", "'è'", "'ë'", "'ê'", "'É'", "'È'", "'Ë'", "'Ê'", "'á'", "'à'", "'ä'", "'â'", "'å'", "'Á'", "'À'", "'Ä'", "'Â'", "'Å'", "'ó'", "'ò'", "'ö'", "'ô'", "'Ó'", "'Ò'", "'Ö'", "'Ô'", "'í'", "'ì'", "'ï'", "'î'", "'Í'", "'Ì'", "'Ï'", "'Î'", "'ú'", "'ù'", "'ü'", "'û'", "'Ú'", "'Ù'", "'Ü'", "'Û'", "'ý'", "'ÿ'", "'Ý'", "'ø'", "'Ø'", "'œ'", "'Œ'", "'Æ'", "'ç'", "'Ç'");
$replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C');
$chain = preg_replace($pattern, $replace, $chain);
echo $chain; // print pret-a-porter
$first_name = preg_replace($pattern, $replace, $first_name);
echo $first_name; // does not change the input!?!
Run Code Online (Sandbox Code Playgroud)
为什么它适用于$ chain,但$ first_name或$ last_name不起作用?
我也试试
echo $first_name; // print áááááábéééééébšššš
$trans = array("á" => "a", "é" => "e", "š" => "s");
echo strtr("áááááábéééééébšššš", $trans); // print aaaaaabeeeeeebssss
echo strtr($first_name,$trans); // print áááááábéééééébšššš
Run Code Online (Sandbox Code Playgroud)
但正如你所看到的那样,问题是一样的!
dmp*_*dmp 78
有一个更简单的方法来实现这一点,使用iconv- 从用户注释,这似乎是你想要做的:字符音译
// PHP.net User notes
<?php
$string = "?ABB?S?B?D";
echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $string);
// output: [nothing, and you get a notice]
echo iconv('UTF-8', 'ISO-8859-1//IGNORE', $string);
// output: ABBSBD
echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $string);
// output: ABBASABAD
// Yay! That's what I wanted!
?>
Run Code Online (Sandbox Code Playgroud)
是很认真的跟你的字符编码,所以你保持相同的编码在过程中的各个阶段-前端,表单提交,源文件的编码.PHP和表单中的默认编码是ISO-8859-1,在PHP 5.4之前,它变为UTF8(最后!).
您可以使用一些功能来实现创意.首先来自CakePHP的inflector类,名为slug:
public static function slug($string, $replacement = '_') {
$quotedReplacement = preg_quote($replacement, '/');
$merge = array(
'/[^\s\p{Ll}\p{Lm}\p{Lo}\p{Lt}\p{Lu}\p{Nd}]/mu' => ' ',
'/\\s+/' => $replacement,
sprintf('/^[%s]+|[%s]+$/', $quotedReplacement, $quotedReplacement) => '',
);
$map = self::$_transliteration + $merge;
return preg_replace(array_keys($map), array_values($map), $string);
}
Run Code Online (Sandbox Code Playgroud)
它取决于一个self::$_transliteration类似于你在问题中所做的数组 - 你可以在github上看到inflector的源代码.
另一个是我个人使用的功能,来自这里.
function slugify($text,$strict = false) {
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
// replace non letter or digits by -
$text = preg_replace('~[^\\pL\d.]+~u', '-', $text);
// trim
$text = trim($text, '-');
setlocale(LC_CTYPE, 'en_GB.utf8');
// transliterate
if (function_exists('iconv')) {
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
}
// lowercase
$text = strtolower($text);
// remove unwanted characters
$text = preg_replace('~[^-\w.]+~', '', $text);
if (empty($text)) {
return 'empty_$';
}
if ($strict) {
$text = str_replace(".", "_", $text);
}
return $text;
}
Run Code Online (Sandbox Code Playgroud)
什么这些功能做的是音译,创造" 子弹从任意的文本输入,这是使Web应用程序时,在你的工具箱中一个非常非常有用的东西".希望这可以帮助!
Die*_*itz 21
这是一种在应该丢弃的内容和应该替换的内容方面具有一定灵活性的方法.这就是我目前的做法.
$ string ='À带有垃圾的字符串ĨÄ';
$replace = [
'<' => '', '>' => '', ''' => '', '&' => '',
'"' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'Ae',
'Ä' => 'A', 'Å' => 'A', '?' => 'A', '?' => 'A', '?' => 'A', 'Æ' => 'Ae',
'Ç' => 'C', '?' => 'C', '?' => 'C', '?' => 'C', '?' => 'C', '?' => 'D', '?' => 'D',
'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', '?' => 'E',
'?' => 'E', '?' => 'E', '?' => 'E', '?' => 'E', '?' => 'G', '?' => 'G',
'?' => 'G', '?' => 'G', '?' => 'H', '?' => 'H', 'Ì' => 'I', 'Í' => 'I',
'Î' => 'I', 'Ï' => 'I', '?' => 'I', '?' => 'I', '?' => 'I', '?' => 'I',
'?' => 'I', '?' => 'IJ', '?' => 'J', '?' => 'K', '?' => 'K', '?' => 'K',
'?' => 'K', '?' => 'K', '?' => 'K', 'Ñ' => 'N', '?' => 'N', '?' => 'N',
'?' => 'N', '?' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O',
'Ö' => 'Oe', 'Ö' => 'Oe', 'Ø' => 'O', '?' => 'O', '?' => 'O', '?' => 'O',
'Œ' => 'OE', '?' => 'R', '?' => 'R', '?' => 'R', '?' => 'S', 'Š' => 'S',
'?' => 'S', '?' => 'S', '?' => 'S', '?' => 'T', '?' => 'T', '?' => 'T',
'?' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', '?' => 'U',
'Ü' => 'Ue', '?' => 'U', '?' => 'U', '?' => 'U', '?' => 'U', '?' => 'U',
'?' => 'W', 'Ý' => 'Y', '?' => 'Y', 'Ÿ' => 'Y', '?' => 'Z', 'Ž' => 'Z',
'?' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
'ä' => 'ae', 'ä' => 'ae', 'å' => 'a', '?' => 'a', '?' => 'a', '?' => 'a',
'æ' => 'ae', 'ç' => 'c', '?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c',
'?' => 'd', '?' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
'ë' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e',
'ƒ' => 'f', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'h',
'?' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', '?' => 'i',
'?' => 'i', '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'ij', '?' => 'j',
'?' => 'k', '?' => 'k', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l',
'?' => 'l', 'ñ' => 'n', '?' => 'n', '?' => 'n', '?' => 'n', '?' => 'n',
'?' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe',
'ö' => 'oe', 'ø' => 'o', '?' => 'o', '?' => 'o', '?' => 'o', 'œ' => 'oe',
'?' => 'r', '?' => 'r', '?' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u',
'û' => 'u', 'ü' => 'ue', '?' => 'u', 'ü' => 'ue', '?' => 'u', '?' => 'u',
'?' => 'u', '?' => 'u', '?' => 'u', '?' => 'w', 'ý' => 'y', 'ÿ' => 'y',
'?' => 'y', 'ž' => 'z', '?' => 'z', '?' => 'z', 'þ' => 't', 'ß' => 'ss',
'?' => 'ss', '??' => 'iy', '?' => 'A', '?' => 'B', '?' => 'V', '?' => 'G',
'?' => 'D', '?' => 'E', '?' => 'YO', '?' => 'ZH', '?' => 'Z', '?' => 'I',
'?' => 'Y', '?' => 'K', '?' => 'L', '?' => 'M', '?' => 'N', '?' => 'O',
'?' => 'P', '?' => 'R', '?' => 'S', '?' => 'T', '?' => 'U', '?' => 'F',
'?' => 'H', '?' => 'C', '?' => 'CH', '?' => 'SH', '?' => 'SCH', '?' => '',
'?' => 'Y', '?' => '', '?' => 'E', '?' => 'YU', '?' => 'YA', '?' => 'a',
'?' => 'b', '?' => 'v', '?' => 'g', '?' => 'd', '?' => 'e', '?' => 'yo',
'?' => 'zh', '?' => 'z', '?' => 'i', '?' => 'y', '?' => 'k', '?' => 'l',
'?' => 'm', '?' => 'n', '?' => 'o', '?' => 'p', '?' => 'r', '?' => 's',
'?' => 't', '?' => 'u', '?' => 'f', '?' => 'h', '?' => 'c', '?' => 'ch',
'?' => 'sh', '?' => 'sch', '?' => '', '?' => 'y', '?' => '', '?' => 'e',
'?' => 'yu', '?' => 'ya'
];
echo str_replace(array_keys($replace), $replace, $string);
Run Code Online (Sandbox Code Playgroud)
woo*_*ive 18
截至PHP> = 5.4.0
$translatedString = transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove', $string);
Run Code Online (Sandbox Code Playgroud)
字符串$ chain与数组中的字符具有相同的字符编码 - 甚至可能是$ first_name字符串采用不同的编码,因此这些字符不匹配.您可能希望尝试使用多字节字符串函数.
试试mb_convert_encoding.您可能还想尝试使用HTML_ENTITIES作为to_encoding参数,然后您不必担心字符将如何转换 - 它将是非常可预测的.
假设您对此脚本的输入是UTF-8,可能不是一个糟糕的起点......
$first_name = mb_convert_encoding($first_name, "HTML-ENTITIES", "UTF-8");
Run Code Online (Sandbox Code Playgroud)
希望我早点找到这个线程。我做的功能(花了我很长时间)如下:
function CheckLetters($field){
$letters = [
0 => "a à á â ä æ ã å ?",
1 => "c ç ? ?",
2 => "e é è ê ë ? ? ?",
3 => "i ? ? í ì ï î",
4 => "l ?",
5 => "n ñ ?",
6 => "o ? ø œ õ ó ò ö ô",
7 => "s ß ? š",
8 => "u ? ú ù ü û",
9 => "w ?",
10 => "y ? ÿ",
11 => "z ? ž ?",
];
foreach ($letters as &$values){
$newValue = substr($values, 0, 1);
$values = substr($values, 2, strlen($values));
$values = explode(" ", $values);
foreach ($values as &$oldValue){
while (strpos($field,$oldValue) !== false){
$field = preg_replace("/" . $oldValue . '/', $newValue, $field, 1);
}
}
}
return $field;
}
Run Code Online (Sandbox Code Playgroud)
小智 5
功能简单。将 \'\xc3\x81b\xc3\xa7 \xc3\x89fg\' 等字符串转换为 \'abc_efg\'
\n\n/**\n * @param $str\n * @return mixed\n */\nfunction sanitizeString($str) {\n $str = preg_replace(\'/[\xc3\xa1\xc3\xa0\xc3\xa3\xc3\xa2\xc3\xa4]/ui\', \'a\', $str);\n $str = preg_replace(\'/[\xc3\xa9\xc3\xa8\xc3\xaa\xc3\xab]/ui\', \'e\', $str);\n $str = preg_replace(\'/[\xc3\xad\xc3\xac\xc3\xae\xc3\xaf]/ui\', \'i\', $str);\n $str = preg_replace(\'/[\xc3\xb3\xc3\xb2\xc3\xb5\xc3\xb4\xc3\xb6]/ui\', \'o\', $str);\n $str = preg_replace(\'/[\xc3\xba\xc3\xb9\xc3\xbb\xc3\xbc]/ui\', \'u\', $str);\n $str = preg_replace(\'/[\xc3\xa7]/ui\', \'c\', $str);\n $str = preg_replace(\'/[^a-z0-9]/i\', \'_\', $str);\n $str = preg_replace(\'/_+/\', \'_\', $str);\n\n return $str;\n}\nRun Code Online (Sandbox Code Playgroud)\n