从字符串中删除非ascii字符

Lor*_*eck 56 php

从网站上提取数据时我会遇到奇怪的字符:

Â
Run Code Online (Sandbox Code Playgroud)

如何删除非扩展ASCII字符的任何内容?

Chr*_*oft 87

正则表达式替换将是最佳选择.使用$str作为示例字符串并使用:print:它匹配,这是一个POSIX字符类:

$str = 'aAÂ';
$str = preg_replace('/[[:^print:]]/', '', $str); // should be aA
Run Code Online (Sandbox Code Playgroud)

什么:print:是寻找所有可打印的字符.反之,:^print:查找所有不可打印的字符.将删除不属于当前字符集的任何字符.

注意:在使用此方法之前,必须确保当前字符集是ASCII.POSIX字符类支持ASCII和Unicode,并且仅根据当前字符集进行匹配.从PHP 5.6开始,默认字符集为UTF-8.

  • 这个解决方案不适合我.:(我得到的答案.php 5.3.0.(windows) (4认同)
  • 是的,这个答案仅适用于配置错误的系统'''显然是一个打印字符:(它既是墨水,又消耗空间)使用''/[[:^ascii:]]/''`而不是''/[ [:^print:]]/'` 去除非 ASCII。 (2认同)

Dam*_*irR 39

您只想要ASCII可打印字符吗?

用这个:

<?php
header('Content-Type: text/html; charset=UTF-8');
$str = "abqwreš??žsff";
$res = preg_replace('/[^\x20-\x7E]/','', $str);
echo "($str)($res)";
Run Code Online (Sandbox Code Playgroud)

或者甚至更好,将您的输入转换为utf8并使用phputf8 lib将"非正常"字符转换为其ascii表示:

require_once('libs/utf8/utf8.php');
require_once('libs/utf8/utils/bad.php');
require_once('libs/utf8/utils/validation.php');
require_once('libs/utf8_to_ascii/utf8_to_ascii.php');

if(!utf8_is_valid($str))
{
  $str=utf8_bad_strip($str);
}

$str = utf8_to_ascii($str, '' );
Run Code Online (Sandbox Code Playgroud)

  • 我还想保留制表符,所以我使用了这个正则表达式:[^\x00-\x7E] (2认同)

Uto*_*pia 26

$clearstring=filter_var($rawstring, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH);


Sil*_*mer 20

有点相关,我们有一个Web应用程序,必须将数据发送到遗留系统,该系统只能处理ASCII字符集的前128个字符.

我们必须使用的解决方案是将尽可能多的字符"转换"为紧密匹配的ASCII等价物,但留下任何无法单独翻译的内容.

通常我会做这样的事情:

<?php
// transliterate
if (function_exists('iconv')) {
    $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    }
?>
Run Code Online (Sandbox Code Playgroud)

...但是,它取代了无法翻译成问号的所有内容(?).

所以我们最终做了以下事情.在这个函数的末尾检查(注释掉)只删除非ASCII字符的php正则表达式.

<?php
public function cleanNonAsciiCharactersInString($orig_text) {

    $text = $orig_text;

    // Single letters
    $text = preg_replace("/[????áàâãªä]/u",      "a", $text);
    $text = preg_replace("/[??????ÁÀÂÃÄ]/u",     "A", $text);
    $text = preg_replace("/[??????]/u",           "b", $text);
    $text = preg_replace("/[???]/u",            "B", $text);
    $text = preg_replace("/[ç?©?]/u",            "c", $text);
    $text = preg_replace("/[Ç?]/u",              "C", $text);        
    $text = preg_replace("/[??]/u",             "d", $text);
    $text = preg_replace("/[éèêë?ëè????????]/u", "e", $text);
    $text = preg_replace("/[ÉÈÊË€??€??]/u",     "E", $text);
    $text = preg_replace("/[?]/u",               "F", $text);
    $text = preg_replace("/[????]/u",           "H", $text);
    $text = preg_replace("/[???]/u",            "h", $text);
    $text = preg_replace("/[ÍÌÎÏ]/u",           "I", $text);
    $text = preg_replace("/[íìîï????]/u",       "i", $text);
    $text = preg_replace("/[??]/u",             "j", $text);
    $text = preg_replace("/[???]/u",            'K', $text);
    $text = preg_replace("/[??]/u",             'k', $text);
    $text = preg_replace("/[??]/u",             'l', $text);
    $text = preg_replace("/[??]/u",             "M", $text);
    $text = preg_replace("/[ñ?????]/u",            "n", $text);
    $text = preg_replace("/[Ñ?????????]/u",       "N", $text);
    $text = preg_replace("/[óòôõºö???????]/u", "o", $text);
    $text = preg_replace("/[ÓÒÔÕÖ?????]/u",     "O", $text);
    $text = preg_replace("/[?????]/u",          "p", $text);
    $text = preg_replace("/[®??]/u",              "R", $text); 
    $text = preg_replace("/[????]/u",              "r", $text); 
    $text = preg_replace("/[?]/u",              "S", $text);
    $text = preg_replace("/[?]/u",              "s", $text);
    $text = preg_replace("/[??]/u",              "T", $text);
    $text = preg_replace("/[?†‡]/u",              "t", $text);
    $text = preg_replace("/[úùûü???µ???]/u",     "u", $text);
    $text = preg_replace("/[?]/u",               "v", $text);
    $text = preg_replace("/[ÚÙÛÜ???]/u",         "U", $text);
    $text = preg_replace("/[??????????]/u",      "w", $text);
    $text = preg_replace("/[?????]/u",          "W", $text);
    $text = preg_replace("/[?????]/u",          "x", $text);
    $text = preg_replace("/[??¥]/u",           "Y", $text);
    $text = preg_replace("/[???????]/u",       "y", $text);
    $text = preg_replace("/[?]/u",              "Z", $text);

    // Punctuation
    $text = preg_replace("/[‚‚??]/u", ",", $text);        
    $text = preg_replace("/[`??’‘]/u", "'", $text);
    $text = preg_replace("/[?“”«»„]/u", '"', $text);
    $text = preg_replace("/[—–??–??????]/u", '-', $text);
    $text = preg_replace("/[  ]/u", ' ', $text);

    $text = str_replace("…", "...", $text);
    $text = str_replace("?", "!=", $text);
    $text = str_replace("?", "<=", $text);
    $text = str_replace("?", ">=", $text);
    $text = preg_replace("/[???]/u", "=", $text);


    // Exciting combinations    
    $text = str_replace("??", "bl", $text);
    $text = str_replace("?", "c/o", $text);
    $text = str_replace("?", "Pts", $text);
    $text = str_replace("™", "tm", $text);
    $text = str_replace("?", "No", $text);        
    $text = str_replace("?", "4", $text);                
    $text = str_replace("‰", "%", $text);
    $text = preg_replace("/[?•]/u", "*", $text);
    $text = str_replace("‹", "<", $text);
    $text = str_replace("›", ">", $text);
    $text = str_replace("?", "!!", $text);
    $text = str_replace("?", "/", $text);
    $text = str_replace("?", "/", $text);
    $text = str_replace("?", "7/8", $text);
    $text = str_replace("?", "5/8", $text);
    $text = str_replace("?", "3/8", $text);
    $text = str_replace("?", "1/8", $text);        
    $text = preg_replace("/[‰]/u", "%", $text);
    $text = preg_replace("/[??]/u", "Ab", $text);
    $text = preg_replace("/[??]/u", "IO", $text);
    $text = preg_replace("/[????]/u", "fi", $text);
    $text = preg_replace("/[??]/u", "3", $text); 
    $text = str_replace("£", "(pounds)", $text);
    $text = str_replace("?", "(lira)", $text);
    $text = preg_replace("/[‰]/u", "%", $text);
    $text = preg_replace("/[?????]/u", "|", $text);
    $text = preg_replace("/[??????]/u", "", $text);


    //2) Translation CP1252.
    $trans = get_html_translation_table(HTML_ENTITIES);
    $trans['f'] = '&fnof;';    // Latin Small Letter F With Hook
    $trans['-'] = array(
        '&hellip;',     // Horizontal Ellipsis
        '&tilde;',      // Small Tilde
        '&ndash;'       // Dash
        );
    $trans["+"] = '&dagger;';    // Dagger
    $trans['#'] = '&Dagger;';    // Double Dagger         
    $trans['M'] = '&permil;';    // Per Mille Sign
    $trans['S'] = '&Scaron;';    // Latin Capital Letter S With Caron        
    $trans['OE'] = '&OElig;';    // Latin Capital Ligature OE
    $trans["'"] = array(
        '&lsquo;',  // Left Single Quotation Mark
        '&rsquo;',  // Right Single Quotation Mark
        '&rsaquo;', // Single Right-Pointing Angle Quotation Mark
        '&sbquo;',  // Single Low-9 Quotation Mark
        '&circ;',   // Modifier Letter Circumflex Accent
        '&lsaquo;'  // Single Left-Pointing Angle Quotation Mark
        );

    $trans['"'] = array(
        '&ldquo;',  // Left Double Quotation Mark
        '&rdquo;',  // Right Double Quotation Mark
        '&bdquo;',  // Double Low-9 Quotation Mark
        );

    $trans['*'] = '&bull;';    // Bullet
    $trans['n'] = '&ndash;';    // En Dash
    $trans['m'] = '&mdash;';    // Em Dash        
    $trans['tm'] = '&trade;';    // Trade Mark Sign
    $trans['s'] = '&scaron;';    // Latin Small Letter S With Caron
    $trans['oe'] = '&oelig;';    // Latin Small Ligature OE
    $trans['Y'] = '&Yuml;';    // Latin Capital Letter Y With Diaeresis
    $trans['euro'] = '&euro;';    // euro currency symbol
    ksort($trans);

    foreach ($trans as $k => $v) {
        $text = str_replace($v, $k, $text);
    }

    // 3) remove <p>, <br/> ...
    $text = strip_tags($text);

    // 4) &amp; => & &quot; => '
    $text = html_entity_decode($text);


    // transliterate
    // if (function_exists('iconv')) {
    // $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    // }

    // remove non ascii characters
    // $text =  preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $text);      

    return $text;
}

?>
Run Code Online (Sandbox Code Playgroud)