用于制作slug的PHP函数(URL字符串)

Question

用于制作slug的PHP函数(URL字符串)

And*_* SK 149 php internationalization slug

__PRE__

工作得很好,但我发现了一些失败的案例:

gen_slug('Andrés Cortez')返回andres-cortez而不是gen_slug('Andrés Cortez')

为什么？关于andres-cortez参数的任何想法？

Answer 1

Mae*_*lyn 408

尝试这个,而不是冗长的替换:

public static function slugify($text)
{
  // replace non letter or digits by -
  $text = preg_replace('~[^\pL\d]+~u', '-', $text);

  // transliterate
  $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);

  // remove unwanted characters
  $text = preg_replace('~[^-\w]+~', '', $text);

  // trim
  $text = trim($text, '-');

  // remove duplicate -
  $text = preg_replace('~-+~', '-', $text);

  // lowercase
  $text = strtolower($text);

  if (empty($text)) {
    return 'n-a';
  }

  return $text;
}

Run Code Online (Sandbox Code Playgroud)

这是基于Symfony的Jobeet教程中的一个.

如果`$ text`包含没有ascii等效字符的字符,`iconv`将无法正确转换.例如`iconv('utf-8','us-ascii // TRANSLIT',"EFI收购Cretaprint")`将返回"EFI"并泄漏警告. (10认同)
不适用于西里尔文,所有字符都被删除. (7认同)
不,第一个preg_replace擦除了不是字符或数字的所有内容.注意在开始括号之后的`^` - 它会反转匹配. (4认同)
@Maerlyn和andufo,我认为在第一个正则表达式中有一个额外的"\",对吗？应该是'〜[^\pL\d] +〜你'？ (2认同)
`$ text = trim($ text,' - ');`应该在最后,否则`Foo收`变为`foo-`.此外,`Foo收酒吧'成为`foo - bar`(重复的`-`似乎是多余的). (2认同)
不,不,不,这不行.第一个表达式DOES替换所有非字母字符.不应该被接受. (2认同)

Answer 2

The*_*pit 43

更新

由于这个答案得到了一些关注,我正在补充一些解释.

提供的解决方案基本上用 - (连字符)替换除AZ,az,0-9和 - (连字符)之外的所有内容.因此,它将无法与其他unicode字符(URL slug/string的有效字符)一起正常工作.常见的情况是输入字符串包含非英语字符.

如果您确信输入字符串不具有您可能希望成为输出/ slug一部分的unicode字符,请仅使用此解决方案.

例如."नारीशक्ति"将变为"----------"(所有连字符)而不是"नारी-शक्ति"(有效的URL slug).

原始答案

怎么样...

$slug = strtolower(trim(preg_replace('/[^A-Za-z0-9-]+/', '-', $string)));

Run Code Online (Sandbox Code Playgroud)

？

实际上确实如此.这个帖子太奇怪了...接受的答案不起作用,所有其他类型的...... (4认同)
以这种方式生成的 slugs 绝不是 SEO 友好或用户友好的。此外，它们在许多语言中都会产生大量冲突，其程度远远超过正确的音译所造成的冲突。 (2认同)

Answer 3

小智 35

如果安装了intl扩展,则可以使用transliterator_transliterate函数轻松创建slug.

您可以稍后用短划线替换空格,使其更像slug.

<?php
$string = 'Namnet på bildtävlingen';
$slug = \Transliterator::createFromRules(
    ':: Any-Latin;'
    . ':: NFD;'
    . ':: [:Nonspacing Mark:] Remove;'
    . ':: NFC;'
    . ':: [:Punctuation:] Remove;'
    . ':: Lower();'
    . '[:Separator:] > \'-\''
)
    ->transliterate( $string );
echo $slug; // namnet-pa-bildtavlingen
?>

Run Code Online (Sandbox Code Playgroud)

对于那些年后到达这篇文章的人来说,自5.3.0以来,intl扩展与PHP捆绑在一起.http://php.net/manual/en/intl.requirements.php (14认同)

Answer 4

Imr*_*hsh 23

注意:我从wordpress中取得了这个功能!!

像这样使用它:

echo sanitize('testing this link');

Run Code Online (Sandbox Code Playgroud)

码

//taken from wordpress
function utf8_uri_encode( $utf8_string, $length = 0 ) {
    $unicode = '';
    $values = array();
    $num_octets = 1;
    $unicode_length = 0;

    $string_length = strlen( $utf8_string );
    for ($i = 0; $i < $string_length; $i++ ) {

        $value = ord( $utf8_string[ $i ] );

        if ( $value < 128 ) {
            if ( $length && ( $unicode_length >= $length ) )
                break;
            $unicode .= chr($value);
            $unicode_length++;
        } else {
            if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3;

            $values[] = $value;

            if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length )
                break;
            if ( count( $values ) == $num_octets ) {
                if ($num_octets == 3) {
                    $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
                    $unicode_length += 9;
                } else {
                    $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
                    $unicode_length += 6;
                }

                $values = array();
                $num_octets = 1;
            }
        }
    }

    return $unicode;
}

//taken from wordpress
function seems_utf8($str) {
    $length = strlen($str);
    for ($i=0; $i < $length; $i++) {
        $c = ord($str[$i]);
        if ($c < 0x80) $n = 0; # 0bbbbbbb
        elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
        elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
        elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
        elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
        elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
        else return false; # Does not match any model
        for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
            if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
                return false;
        }
    }
    return true;
}

//function sanitize_title_with_dashes taken from wordpress
function sanitize($title) {
    $title = strip_tags($title);
    // Preserve escaped octets.
    $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
    // Remove percent signs that are not part of an octet.
    $title = str_replace('%', '', $title);
    // Restore octets.
    $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);

    if (seems_utf8($title)) {
        if (function_exists('mb_strtolower')) {
            $title = mb_strtolower($title, 'UTF-8');
        }
        $title = utf8_uri_encode($title, 200);
    }

    $title = strtolower($title);
    $title = preg_replace('/&.+?;/', '', $title); // kill entities
    $title = str_replace('.', '-', $title);
    $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
    $title = preg_replace('/\s+/', '-', $title);
    $title = preg_replace('|-+|', '-', $title);
    $title = trim($title, '-');

    return $title;
}

Run Code Online (Sandbox Code Playgroud)

`sanitize` 是一个奇怪的、容易忘记的函数名，用于生成一个 slug。 (2认同)

Answer 5

Vaz*_*yan 9

使用许多高级开发人员支持的现有解决方案总是一个好主意。最受欢迎的一种是https://github.com/cocur/slugify。首先，它支持多种语言，并且正在更新中。

如果您不想使用整个程序包，则可以复制所需的部分。

Answer 6

s3c*_*s3c 9

这里已经有很多答案，所以我几乎不想添加另一个答案，但是没有一个函数可以完成我需要的一切。

\n

对我来说最好的基础是第 3 号函数，其中比较了它们的速度。我添加/修复了一些替换项，所以

\n

\'刚刚被删除，
.被替换为-,
\xce\xb1被替换为a,
\xe1\xba\x9e被替换为b,
\xc5\x81（和类似）被替换为L代替K和
\xe2\x82\xac和$符号分别替换为eur和usd（必要时添加更多）。

\n

您可以选择添加\'&\' => \'-and-\'，但 SEO 建议不要使用连词（#8），因此我将其保留在我的用例中。（不过，此函数不会从字符串中删除现有的ands 和s ）or

\n

我还添加了一行代码来修复我想出的这个奇怪字符串中的双破折号，以及一个可选参数来限制 slug 的长度。

\n

代码

\n

<?php\nfunction slugify($text, $length = null)\n{\n    $replacements = [\n        \'<\' => \'\', \'>\' => \'\', \'-\' => \' \', \'&\' => \'\', \'"\' => \'\', \'\xc3\x80\' => \'A\', \'\xc3\x81\' => \'A\', \'\xc3\x82\' => \'A\', \'\xc3\x83\' => \'A\', \'\xc3\x84\' => \'Ae\', \'\xc3\x84\' => \'A\', \'\xc3\x85\' => \'A\', \'\xc4\x80\' => \'A\', \'\xc4\x84\' => \'A\', \'\xc4\x82\' => \'A\', \'\xc3\x86\' => \'Ae\', \'\xc3\x87\' => \'C\', "\'" => \'\', \'\xc4\x86\' => \'C\', \'\xc4\x8c\' => \'C\', \'\xc4\x88\' => \'C\', \'\xc4\x8a\' => \'C\', \'\xc4\x8e\' => \'D\', \'\xc4\x90\' => \'D\', \'\xc3\x90\' => \'D\', \'\xc3\x88\' => \'E\', \'\xc3\x89\' => \'E\', \'\xc3\x8a\' => \'E\', \'\xc3\x8b\' => \'E\', \'\xc4\x92\' => \'E\', \'\xc4\x98\' => \'E\', \'\xc4\x9a\' => \'E\', \'\xc4\x94\' => \'E\', \'\xc4\x96\' => \'E\', \'\xc4\x9c\' => \'G\', \'\xc4\x9e\' => \'G\', \'\xc4\xa0\' => \'G\', \'\xc4\xa2\' => \'G\', \'\xc4\xa4\' => \'H\', \'\xc4\xa6\' => \'H\', \'\xc3\x8c\' => \'I\', \'\xc3\x8d\' => \'I\', \'\xc3\x8e\' => \'I\', \'\xc3\x8f\' => \'I\', \'\xc4\xaa\' => \'I\', \'\xc4\xa8\' => \'I\', \'\xc4\xac\' => \'I\', \'\xc4\xae\' => \'I\', \'\xc4\xb0\' => \'I\', \'\xc4\xb2\' => \'IJ\', \'\xc4\xb4\' => \'J\', \'\xc4\xb6\' => \'K\', \'\xc5\x81\' => \'L\', \'\xc4\xbd\' => \'L\', \'\xc4\xb9\' => \'L\', \'\xc4\xbb\' => \'L\', \'\xc4\xbf\' => \'L\', \'\xc3\x91\' => \'N\', \'\xc5\x83\' => \'N\', \'\xc5\x87\' => \'N\', \'\xc5\x85\' => \'N\', \'\xc5\x8a\' => \'N\', \'\xc3\x92\' => \'O\', \'\xc3\x93\' => \'O\', \'\xc3\x94\' => \'O\', \'\xc3\x95\' => \'O\', \'\xc3\x96\' => \'Oe\', \'\xc3\x96\' => \'Oe\', \'\xc3\x98\' => \'O\', \'\xc5\x8c\' => \'O\', \'\xc5\x90\' => \'O\', \'\xc5\x8e\' => \'O\', \'\xc5\x92\' => \'OE\', \'\xc5\x94\' => \'R\', \'\xc5\x98\' => \'R\', \'\xc5\x96\' => \'R\', \'\xc5\x9a\' => \'S\', \'\xc5\xa0\' => \'S\', \'\xc5\x9e\' => \'S\', \'\xc5\x9c\' => \'S\', \'\xc8\x98\' => \'S\', \'\xc5\xa4\' => \'T\', \'\xc5\xa2\' => \'T\', \'\xc5\xa6\' => \'T\', \'\xc8\x9a\' => \'T\', \'\xc3\x99\' => \'U\', \'\xc3\x9a\' => \'U\', \'\xc3\x9b\' => \'U\', \'\xc3\x9c\' => \'Ue\', \'\xc5\xaa\' => \'U\', \'\xc3\x9c\' => \'Ue\', \'\xc5\xae\' => \'U\', \'\xc5\xb0\' => \'U\', \'\xc5\xac\' => \'U\', \'\xc5\xa8\' => \'U\', \'\xc5\xb2\' => \'U\', \'\xc5\xb4\' => \'W\', \'\xc3\x9d\' => \'Y\', \'\xc5\xb6\' => \'Y\', \'\xc5\xb8\' => \'Y\', \'\xc5\xb9\' => \'Z\', \'\xc5\xbd\' => \'Z\', \'\xc5\xbb\' => \'Z\', \'\xc3\x9e\' => \'T\', \'\xc3\xa0\' => \'a\', \'\xc3\xa1\' => \'a\', \'\xc3\xa2\' => \'a\', \'\xc3\xa3\' => \'a\', \'\xc3\xa4\' => \'ae\', \'\xc3\xa4\' => \'ae\', \'\xc3\xa5\' => \'a\', \'\xc4\x81\' => \'a\', \'\xc4\x85\' => \'a\', \'\xc4\x83\' => \'a\', \'\xc3\xa6\' => \'ae\', \'\xc3\xa7\' => \'c\', \'\xc4\x87\' => \'c\', \'\xc4\x8d\' => \'c\', \'\xc4\x89\' => \'c\', \'\xc4\x8b\' => \'c\', \'\xc4\x8f\' => \'d\', \'\xc4\x91\' => \'d\', \'\xc3\xb0\' => \'d\', \'\xc3\xa8\' => \'e\', \'\xc3\xa9\' => \'e\', \'\xc3\xaa\' => \'e\', \'\xc3\xab\' => \'e\', \'\xc4\x93\' => \'e\', \'\xc4\x99\' => \'e\', \'\xc4\x9b\' => \'e\', \'\xc4\x95\' => \'e\', \'\xc4\x97\' => \'e\', \'\xc6\x92\' => \'f\', \'\xc4\x9d\' => \'g\', \'\xc4\x9f\' => \'g\', \'\xc4\xa1\' => \'g\', \'\xc4\xa3\' => \'g\', \'\xc4\xa5\' => \'h\', \'\xc4\xa7\' => \'h\', \'\xc3\xac\' => \'i\', \'\xc3\xad\' => \'i\', \'\xc3\xae\' => \'i\', \'\xc3\xaf\' => \'i\', \'\xc4\xab\' => \'i\', \'\xc4\xa9\' => \'i\', \'\xc4\xad\' => \'i\', \'\xc4\xaf\' => \'i\', \'\xc4\xb1\' => \'i\', \'\xc4\xb3\' => \'ij\', \'\xc4\xb5\' => \'j\', \'\xc4\xb7\' => \'k\', \'\xc4\xb8\' => \'k\', \'\xc5\x82\' => \'l\', \'\xc4\xbe\' => \'l\', \'\xc4\xba\' => \'l\', \'\xc4\xbc\' => \'l\', \'\xc5\x80\' => \'l\', \'\xc3\xb1\' => \'n\', \'\xc5\x84\' => \'n\', \'\xc5\x88\' => \'n\', \'\xc5\x86\' => \'n\', \'\xc5\x89\' => \'n\', \'\xc5\x8b\' => \'n\', \'\xc3\xb2\' => \'o\', \'\xc3\xb3\' => \'o\', \'\xc3\xb4\' => \'o\', \'\xc3\xb5\' => \'o\', \'\xc3\xb6\' => \'oe\', \'\xc3\xb6\' => \'oe\', \'\xc3\xb8\' => \'o\', \'\xc5\x8d\' => \'o\', \'\xc5\x91\' => \'o\', \'\xc5\x8f\' => \'o\', \'\xc5\x93\' => \'oe\', \'\xc5\x95\' => \'r\', \'\xc5\x99\' => \'r\', \'\xc5\x97\' => \'r\', \'\xc5\xa1\' => \'s\', \'\xc5\x9b\' => \'s\', \'\xc3\xb9\' => \'u\', \'\xc3\xba\' => \'u\', \'\xc3\xbb\' => \'u\', \'\xc3\xbc\' => \'ue\', \'\xc5\xab\' => \'u\', \'\xc3\xbc\' => \'ue\', \'\xc5\xaf\' => \'u\', \'\xc5\xb1\' => \'u\', \'\xc5\xad\' => \'u\', \'\xc5\xa9\' => \'u\', \'\xc5\xb3\' => \'u\', \'\xc5\xb5\' => \'w\', \'\xc3\xbd\' => \'y\', \'\xc3\xbf\' => \'y\', \'\xc5\xb7\' => \'y\', \'\xc5\xbe\' => \'z\', \'\xc5\xbc\' => \'z\', \'\xc5\xba\' => \'z\', \'\xc3\xbe\' => \'t\', \'\xce\xb1\' => \'a\', \'\xc3\x9f\' => \'ss\', \'\xe1\xba\x9e\' => \'b\', \'\xc5\xbf\' => \'ss\', \'\xd1\x8b\xd0\xb9\' => \'iy\', \'\xd0\x90\' => \'A\', \'\xd0\x91\' => \'B\', \'\xd0\x92\' => \'V\', \'\xd0\x93\' => \'G\', \'\xd0\x94\' => \'D\', \'\xd0\x95\' => \'E\', \'\xd0\x81\' => \'YO\', \'\xd0\x96\' => \'ZH\', \'\xd0\x97\' => \'Z\', \'\xd0\x98\' => \'I\', \'\xd0\x99\' => \'Y\', \'\xd0\x9a\' => \'K\', \'\xd0\x9b\' => \'L\', \'\xd0\x9c\' => \'M\', \'\xd0\x9d\' => \'N\', \'\xd0\x9e\' => \'O\', \'\xd0\x9f\' => \'P\', \'\xd0\xa0\' => \'R\', \'\xd0\xa1\' => \'S\', \'\xd0\xa2\' => \'T\', \'\xd0\xa3\' => \'U\', \'\xd0\xa4\' => \'F\', \'\xd0\xa5\' => \'H\', \'\xd0\xa6\' => \'C\', \'\xd0\xa7\' => \'CH\', \'\xd0\xa8\' => \'SH\', \'\xd0\xa9\' => \'SCH\', \'\xd0\xaa\' => \'\', \'\xd0\xab\' => \'Y\', \'\xd0\xac\' => \'\', \'\xd0\xad\' => \'E\', \'\xd0\xae\' => \'YU\', \'\xd0\xaf\' => \'YA\', \'\xd0\xb0\' => \'a\', \'\xd0\xb1\' => \'b\', \'\xd0\xb2\' => \'v\', \'\xd0\xb3\' => \'g\', \'\xd0\xb4\' => \'d\', \'\xd0\xb5\' => \'e\', \'\xd1\x91\' => \'yo\', \'\xd0\xb6\' => \'zh\', \'\xd0\xb7\' => \'z\', \'\xd0\xb8\' => \'i\', \'\xd0\xb9\' => \'y\', \'\xd0\xba\' => \'k\', \'\xd0\xbb\' => \'l\', \'\xd0\xbc\' => \'m\', \'\xd0\xbd\' => \'n\', \'\xd0\xbe\' => \'o\', \'\xd0\xbf\' => \'p\', \'\xd1\x80\' => \'r\', \'\xd1\x81\' => \'s\', \'\xd1\x82\' => \'t\', \'\xd1\x83\' => \'u\', \'\xd1\x84\' => \'f\', \'\xd1\x85\' => \'h\', \'\xd1\x86\' => \'c\', \'\xd1\x87\' => \'ch\', \'\xd1\x88\' => \'sh\', \'\xd1\x89\' => \'sch\', \'\xd1\x8a\' => \'\', \'\xd1\x8b\' => \'y\', \'\xd1\x8c\' => \'\', \'\xd1\x8d\' => \'e\', \'\xd1\x8e\' => \'yu\', \'\xd1\x8f\' => \'ya\', \'.\' => \'-\', \'\xe2\x82\xac\' => \'-eur-\', \'$\' => \'-usd-\'\n    ];\n    // Replace non-ascii characters\n    $text = strtr($text, $replacements);\n    // Replace non letter or digits with "-"\n    $text = preg_replace(\'~[^\\pL\\d.]+~u\', \'-\', $text);\n    // Replace unwanted characters with "-"\n    $text = preg_replace(\'~[^-\\w.]+~\', \'-\', $text);\n    // Trim "-"\n    $text = trim($text, \'-\');\n    // Remove duplicate "-"\n    $text = preg_replace(\'~-+~\', \'-\', $text);\n    // Convert to lowercase\n    $text = strtolower($text);\n    // Limit length\n    if (isset($length) && $length < strlen($text))\n        $text = rtrim(substr($text, 0, $length), \'-\');\n\n    return $text;\n}\n$text = "--- You can\'t misuse me! Or can-ya? \xc4\x8c\xc4\x86\xc5\xbd\xc5\xa0\xc4\x90\xc3\xb7\xc3\x97\xc3\x9f\xc2\xa4_.,:;-!\\"#$%&/()=?*~\xcb\x87^\xcb\x98\xc2\xb0\xcb\x9b`\xcb\x99\xc2\xb4\xcb\x9d\xc2\xa8\xc2\xb8\xc2\xb8\xc2\xa8\xc5\x81\xc5\x82\xe2\x82\xac\\|@{}[] \xc2\xbf \xc3\x80\xc3\xb1dr\xc3\xa9\xc3\x9f l\'affreux \xc4\x9far\xc3\xa7on & n\xc3\xb8\xc3\xabl en for\xc3\xaat ! Andr\xc3\xa9s Cortez EFI\xe6\x94\xb6\xe8\xb4\xadCretaprint \xc3\x89tienne";\necho "text\\n$text\\n\\nslug\\n".slugify($text);\n

Run Code Online (Sandbox Code Playgroud)\n

输出

\n

text\n--- You can\'t misuse me! Or can-ya? \xc4\x8c\xc4\x86\xc5\xbd\xc5\xa0\xc4\x90\xc3\xb7\xc3\x97\xc3\x9f\xc2\xa4_.,:;-!"#$%&/()=?*~\xcb\x87^\xcb\x98\xc2\xb0\xcb\x9b`\xcb\x99\xc2\xb4\xcb\x9d\xc2\xa8\xc2\xb8\xc2\xb8\xc2\xa8\xc5\x81\xc5\x82\xe2\x82\xac\\|@{}[] \xc2\xbf \xc3\x80\xc3\xb1dr\xc3\xa9\xc3\x9f l\'affreux \xc4\x9far\xc3\xa7on & n\xc3\xb8\xc3\xabl en for\xc3\xaat ! Andr\xc3\xa9s Cortez EFI\xe6\x94\xb6\xe8\xb4\xadCretaprint \xc3\x89tienne\n\nslug\nyou-cant-misuse-me-or-can-ya-cczsd-ss-usd-ll-eur-andress-laffreux-garcon-noel-en-foret-andres-cortez-efi-cretaprint-etienne\n

Run Code Online (Sandbox Code Playgroud)\n

笔记

\n

它也适用于 OP\ 的转换为的情况\'Andr\xc3\xa9s Cortez\'以及\'andres-cortez\'我在该线程中找到的所有其他示例，除了这个超出我范围的字符：。

\n我很高兴知道您发现的错误（希望附有建议）。
\n

Answer 7

Bap*_*ard 7

这是另一个,例如"带有奇怪字符的标题éééAXZ"变成"标题 - 奇怪的字符 - eee-axz".

/**
 * Function used to create a slug associated to an "ugly" string.
 *
 * @param string $string the string to transform.
 *
 * @return string the resulting slug.
 */
public static function createSlug($string) {

    $table = array(
            'Š'=>'S', 'š'=>'s', '?'=>'Dj', '?'=>'dj', 'Ž'=>'Z', 'ž'=>'z', '?'=>'C', '?'=>'c', '?'=>'C', '?'=>'c',
            'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
            'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
            'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
            'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
            'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
            'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
            'ÿ'=>'y', '?'=>'R', '?'=>'r', '/' => '-', ' ' => '-'
    );

    // -- Remove duplicated spaces
    $stripped = preg_replace(array('/\s{2,}/', '/[\t\n]/'), ' ', $string);

    // -- Returns the slug
    return strtolower(strtr($string, $table));


}

Run Code Online (Sandbox Code Playgroud)

Answer 8

cze*_*asz 7

@Imran Omar Bukhsh代码的更新版本(来自最新的Wordpress(4.0)分支):

<?php

// Add methods to slugify taken from Wordpress:
// - https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php 
// - https://github.com/WordPress/WordPress/blob/master/wp-includes/functions.php

/**
 * Set the mbstring internal encoding to a binary safe encoding when func_overload
 * is enabled.
 *
 * When mbstring.func_overload is in use for multi-byte encodings, the results from
 * strlen() and similar functions respect the utf8 characters, causing binary data
 * to return incorrect lengths.
 *
 * This function overrides the mbstring encoding to a binary-safe encoding, and
 * resets it to the users expected encoding afterwards through the
 * `reset_mbstring_encoding` function.
 *
 * It is safe to recursively call this function, however each
 * `mbstring_binary_safe_encoding()` call must be followed up with an equal number
 * of `reset_mbstring_encoding()` calls.
 *
 * @since 3.7.0
 *
 * @see reset_mbstring_encoding()
 *
 * @param bool $reset Optional. Whether to reset the encoding back to a previously-set encoding.
 *                    Default false.
 */
function mbstring_binary_safe_encoding( $reset = false ) {
  static $encodings = array();
  static $overloaded = null;

  if ( is_null( $overloaded ) )
    $overloaded = function_exists( 'mb_internal_encoding' ) && ( ini_get( 'mbstring.func_overload' ) & 2 );

  if ( false === $overloaded )
    return;

  if ( ! $reset ) {
    $encoding = mb_internal_encoding();
    array_push( $encodings, $encoding );
    mb_internal_encoding( 'ISO-8859-1' );
  }

  if ( $reset && $encodings ) {
    $encoding = array_pop( $encodings );
    mb_internal_encoding( $encoding );
  }
}

/**
 * Reset the mbstring internal encoding to a users previously set encoding.
 *
 * @see mbstring_binary_safe_encoding()
 *
 * @since 3.7.0
 */
function reset_mbstring_encoding() {
  mbstring_binary_safe_encoding( true );
}


/**
 * Checks to see if a string is utf8 encoded.
 *
 * NOTE: This function checks for 5-Byte sequences, UTF8
 *       has Bytes Sequences with a maximum length of 4.
 *
 * @author bmorel at ssi dot fr (modified)
 * @since 1.2.1
 *
 * @param string $str The string to be checked
 * @return bool True if $str fits a UTF-8 model, false otherwise.
 */
function seems_utf8($str) {
  mbstring_binary_safe_encoding();
  $length = strlen($str);
  reset_mbstring_encoding();
  for ($i=0; $i < $length; $i++) {
    $c = ord($str[$i]);
    if ($c < 0x80) $n = 0; # 0bbbbbbb
    elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
    elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
    elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
    elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
    elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
    else return false; # Does not match any model
    for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
      if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
        return false;
    }
  }
  return true;
}


/**
 * Encode the Unicode values to be used in the URI.
 *
 * @since 1.5.0
 *
 * @param string $utf8_string
 * @param int $length Max length of the string
 * @return string String with Unicode encoded for URI.
 */
function utf8_uri_encode( $utf8_string, $length = 0 ) {
  $unicode = '';
  $values = array();
  $num_octets = 1;
  $unicode_length = 0;

  mbstring_binary_safe_encoding();
  $string_length = strlen( $utf8_string );
  reset_mbstring_encoding();

  for ($i = 0; $i < $string_length; $i++ ) {

    $value = ord( $utf8_string[ $i ] );

    if ( $value < 128 ) {
      if ( $length && ( $unicode_length >= $length ) )
        break;
      $unicode .= chr($value);
      $unicode_length++;
    } else {
      if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3;

      $values[] = $value;

      if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length )
        break;
      if ( count( $values ) == $num_octets ) {
        if ($num_octets == 3) {
          $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
          $unicode_length += 9;
        } else {
          $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
          $unicode_length += 6;
        }

        $values = array();
        $num_octets = 1;
      }
    }
  }

  return $unicode;
}


/**
 * Sanitizes a title, replacing whitespace and a few other characters with dashes.
 *
 * Limits the output to alphanumeric characters, underscore (_) and dash (-).
 * Whitespace becomes a dash.
 *
 * @since 1.2.0
 *
 * @param string $title The title to be sanitized.
 * @param string $raw_title Optional. Not used.
 * @param string $context Optional. The operation for which the string is sanitized.
 * @return string The sanitized title.
 */
function sanitize_title_with_dashes( $title, $raw_title = '', $context = 'display' ) {
  $title = strip_tags($title);
  // Preserve escaped octets.
  $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
  // Remove percent signs that are not part of an octet.
  $title = str_replace('%', '', $title);
  // Restore octets.
  $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);

  if (seems_utf8($title)) {
    if (function_exists('mb_strtolower')) {
      $title = mb_strtolower($title, 'UTF-8');
    }
    $title = utf8_uri_encode($title, 200);
  }

  $title = strtolower($title);
  $title = preg_replace('/&.+?;/', '', $title); // kill entities
  $title = str_replace('.', '-', $title);

  if ( 'save' == $context ) {
    // Convert nbsp, ndash and mdash to hyphens
    $title = str_replace( array( '%c2%a0', '%e2%80%93', '%e2%80%94' ), '-', $title );

    // Strip these characters entirely
    $title = str_replace( array(
      // iexcl and iquest
      '%c2%a1', '%c2%bf',
      // angle quotes
      '%c2%ab', '%c2%bb', '%e2%80%b9', '%e2%80%ba',
      // curly quotes
      '%e2%80%98', '%e2%80%99', '%e2%80%9c', '%e2%80%9d',
      '%e2%80%9a', '%e2%80%9b', '%e2%80%9e', '%e2%80%9f',
      // copy, reg, deg, hellip and trade
      '%c2%a9', '%c2%ae', '%c2%b0', '%e2%80%a6', '%e2%84%a2',
      // acute accents
      '%c2%b4', '%cb%8a', '%cc%81', '%cd%81',
      // grave accent, macron, caron
      '%cc%80', '%cc%84', '%cc%8c',
    ), '', $title );

    // Convert times to x
    $title = str_replace( '%c3%97', 'x', $title );
  }

  $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
  $title = preg_replace('/\s+/', '-', $title);
  $title = preg_replace('|-+|', '-', $title);
  $title = trim($title, '-');

  return $title;
}

$title = '#PFW Alexander McQueen Spring/Summer 2015';
echo "title -> slug: \n". $title ." -> ". sanitize_title_with_dashes($title);
echo "\n\n";
$title = '«GQ»: Elyas M\'Barek gehört zu Männern des Jahres';
echo "title -> slug: \n". $title ." -> ". sanitize_title_with_dashes($title);

Run Code Online (Sandbox Code Playgroud)

查看在线示例.

Answer 9

小智 7

public static function slugify ($text) {

    $replace = [
        '&lt;' => '', '&gt;' => '', '&#039;' => '', '&amp;' => '',
        '&quot;' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä'=> 'Ae',
        '&Auml;' => 'A', 'Å' => 'A', '?' => 'A', '?' => 'A', '?' => 'A', 'Æ' => 'Ae',
        'Ç' => 'C', '?' => 'C', '?' => 'C', '?' => 'C', '?' => 'C', '?' => 'D', '?' => 'D',
        'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', '?' => 'E',
        '?' => 'E', '?' => 'E', '?' => 'E', '?' => 'E', '?' => 'G', '?' => 'G',
        '?' => 'G', '?' => 'G', '?' => 'H', '?' => 'H', 'Ì' => 'I', 'Í' => 'I',
        'Î' => 'I', 'Ï' => 'I', '?' => 'I', '?' => 'I', '?' => 'I', '?' => 'I',
        '?' => 'I', '?' => 'IJ', '?' => 'J', '?' => 'K', '?' => 'K', '?' => 'K',
        '?' => 'K', '?' => 'K', '?' => 'K', 'Ñ' => 'N', '?' => 'N', '?' => 'N',
        '?' => 'N', '?' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O',
        'Ö' => 'Oe', '&Ouml;' => 'Oe', 'Ø' => 'O', '?' => 'O', '?' => 'O', '?' => 'O',
        'Œ' => 'OE', '?' => 'R', '?' => 'R', '?' => 'R', '?' => 'S', 'Š' => 'S',
        '?' => 'S', '?' => 'S', '?' => 'S', '?' => 'T', '?' => 'T', '?' => 'T',
        '?' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', '?' => 'U',
        '&Uuml;' => 'Ue', '?' => 'U', '?' => 'U', '?' => 'U', '?' => 'U', '?' => 'U',
        '?' => 'W', 'Ý' => 'Y', '?' => 'Y', 'Ÿ' => 'Y', '?' => 'Z', 'Ž' => 'Z',
        '?' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
        'ä' => 'ae', '&auml;' => 'ae', 'å' => 'a', '?' => 'a', '?' => 'a', '?' => 'a',
        'æ' => 'ae', 'ç' => 'c', '?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c',
        '?' => 'd', '?' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
        'ë' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e',
        'ƒ' => 'f', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'h',
        '?' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', '?' => 'i',
        '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'ij', '?' => 'j',
        '?' => 'k', '?' => 'k', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l',
        '?' => 'l', 'ñ' => 'n', '?' => 'n', '?' => 'n', '?' => 'n', '?' => 'n',
        '?' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe',
        '&ouml;' => 'oe', 'ø' => 'o', '?' => 'o', '?' => 'o', '?' => 'o', 'œ' => 'oe',
        '?' => 'r', '?' => 'r', '?' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u',
        'û' => 'u', 'ü' => 'ue', '?' => 'u', '&uuml;' => 'ue', '?' => 'u', '?' => 'u',
        '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'w', 'ý' => 'y', 'ÿ' => 'y',
        '?' => 'y', 'ž' => 'z', '?' => 'z', '?' => 'z', 'þ' => 't', 'ß' => 'ss',
        '?' => 'ss', '??' => 'iy', '?' => 'A', '?' => 'B', '?' => 'V', '?' => 'G',
        '?' => 'D', '?' => 'E', '?' => 'YO', '?' => 'ZH', '?' => 'Z', '?' => 'I',
        '?' => 'Y', '?' => 'K', '?' => 'L', '?' => 'M', '?' => 'N', '?' => 'O',
        '?' => 'P', '?' => 'R', '?' => 'S', '?' => 'T', '?' => 'U', '?' => 'F',
        '?' => 'H', '?' => 'C', '?' => 'CH', '?' => 'SH', '?' => 'SCH', '?' => '',
        '?' => 'Y', '?' => '', '?' => 'E', '?' => 'YU', '?' => 'YA', '?' => 'a',
        '?' => 'b', '?' => 'v', '?' => 'g', '?' => 'd', '?' => 'e', '?' => 'yo',
        '?' => 'zh', '?' => 'z', '?' => 'i', '?' => 'y', '?' => 'k', '?' => 'l',
        '?' => 'm', '?' => 'n', '?' => 'o', '?' => 'p', '?' => 'r', '?' => 's',
        '?' => 't', '?' => 'u', '?' => 'f', '?' => 'h', '?' => 'c', '?' => 'ch',
        '?' => 'sh', '?' => 'sch', '?' => '', '?' => 'y', '?' => '', '?' => 'e',
        '?' => 'yu', '?' => 'ya'
    ];

    // make a human readable string
    $text = strtr($text, $replace);

    // replace non letter or digits by -
    $text = preg_replace('~[^\\pL\d.]+~u', '-', $text);

    // trim
    $text = trim($text, '-');

    // remove unwanted characters
    $text = preg_replace('~[^-\w.]+~', '', $text);

    $text = strtolower($text);

    return $text;
}

Run Code Online (Sandbox Code Playgroud)

嗨，娜迪，欢迎来到 SO。此处不鼓励仅使用代码的答案，因为它们不会教其他人 _how_ 编码。您能否编辑您的帖子以解释您的代码示例的作用以及它如何回答问题？谢谢。 (4认同)

Answer 10

Ent*_*ndu 6

不要为此使用 preg_replace。有一个专为该任务构建的 php 函数：strtr() http://php.net/manual/en/function.strtr.php

摘自上述链接中的评论（我自己测试过；它有效：

function normalize ($string) {
    $table = array(
        'Š'=>'S', 'š'=>'s', '?'=>'Dj', '?'=>'dj', 'Ž'=>'Z', 'ž'=>'z', '?'=>'C', '?'=>'c', '?'=>'C', '?'=>'c',
        'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
        'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
        'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
        'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
        'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
        'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
        'ÿ'=>'y', '?'=>'R', '?'=>'r',
    );

    return strtr($string, $table);
}

Run Code Online (Sandbox Code Playgroud)

Answer 11

Mla*_*vic 5

我在用：

function slugify($text)
{ 
    $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    return strtolower(preg_replace('/[^A-Za-z0-9-]+/', '-', $text));
}

Run Code Online (Sandbox Code Playgroud)

唯一的后退是西里尔字符不会被转换，我现在正在寻找对于每个西里尔字符不长 str_replace 的解决方案。

Answer 12

小智 5

我不知道该使用哪个，所以我在 phptester.net 上做了一个快速的工作台

<?php

// First test
// /sf/answers/2991861211/
function slugify(STRING $string, STRING $separator = '-'){
    
    $accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i';
    $special_cases = [ '&' => 'and', "'" => ''];
    $string = mb_strtolower( trim( $string ), 'UTF-8' );
    $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
    $string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) );
    $string = preg_replace('/[^a-z0-9]/u', $separator, $string);
    
    return preg_replace('/['.$separator.']+/u', $separator, $string);
}

// Second test
// /sf/answers/933236391/
function slug(STRING $string, STRING $separator = '-'){
    
    $string = transliterator_transliterate('Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();', $string);
        
    return str_replace(' ', $separator, $string);;
}

// Third test - My choice
// /sf/answers/2664629551/
function slugbis($text){

    $replace = [
        '<' => '', '>' => '', '-' => ' ', '&' => '',
        '"' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä'=> 'Ae',
        'Ä' => 'A', 'Å' => 'A', '?' => 'A', '?' => 'A', '?' => 'A', 'Æ' => 'Ae',
        'Ç' => 'C', '?' => 'C', '?' => 'C', '?' => 'C', '?' => 'C', '?' => 'D', '?' => 'D',
        'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', '?' => 'E',
        '?' => 'E', '?' => 'E', '?' => 'E', '?' => 'E', '?' => 'G', '?' => 'G',
        '?' => 'G', '?' => 'G', '?' => 'H', '?' => 'H', 'Ì' => 'I', 'Í' => 'I',
        'Î' => 'I', 'Ï' => 'I', '?' => 'I', '?' => 'I', '?' => 'I', '?' => 'I',
        '?' => 'I', '?' => 'IJ', '?' => 'J', '?' => 'K', '?' => 'K', '?' => 'K',
        '?' => 'K', '?' => 'K', '?' => 'K', 'Ñ' => 'N', '?' => 'N', '?' => 'N',
        '?' => 'N', '?' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O',
        'Ö' => 'Oe', 'Ö' => 'Oe', 'Ø' => 'O', '?' => 'O', '?' => 'O', '?' => 'O',
        'Œ' => 'OE', '?' => 'R', '?' => 'R', '?' => 'R', '?' => 'S', 'Š' => 'S',
        '?' => 'S', '?' => 'S', '?' => 'S', '?' => 'T', '?' => 'T', '?' => 'T',
        '?' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', '?' => 'U',
        'Ü' => 'Ue', '?' => 'U', '?' => 'U', '?' => 'U', '?' => 'U', '?' => 'U',
        '?' => 'W', 'Ý' => 'Y', '?' => 'Y', 'Ÿ' => 'Y', '?' => 'Z', 'Ž' => 'Z',
        '?' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
        'ä' => 'ae', 'ä' => 'ae', 'å' => 'a', '?' => 'a', '?' => 'a', '?' => 'a',
        'æ' => 'ae', 'ç' => 'c', '?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c',
        '?' => 'd', '?' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
        'ë' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e',
        'ƒ' => 'f', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'h',
        '?' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', '?' => 'i',
        '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'ij', '?' => 'j',
        '?' => 'k', '?' => 'k', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l',
        '?' => 'l', 'ñ' => 'n', '?' => 'n', '?' => 'n', '?' => 'n', '?' => 'n',
        '?' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe',
        'ö' => 'oe', 'ø' => 'o', '?' => 'o', '?' => 'o', '?' => 'o', 'œ' => 'oe',
        '?' => 'r', '?' => 'r', '?' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u',
        'û' => 'u', 'ü' => 'ue', '?' => 'u', 'ü' => 'ue', '?' => 'u', '?' => 'u',
        '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'w', 'ý' => 'y', 'ÿ' => 'y',
        '?' => 'y', 'ž' => 'z', '?' => 'z', '?' => 'z', 'þ' => 't', 'ß' => 'ss',
        '?' => 'ss', '??' => 'iy', '?' => 'A', '?' => 'B', '?' => 'V', '?' => 'G',
        '?' => 'D', '?' => 'E', '?' => 'YO', '?' => 'ZH', '?' => 'Z', '?' => 'I',
        '?' => 'Y', '?' => 'K', '?' => 'L', '?' => 'M', '?' => 'N', '?' => 'O',
        '?' => 'P', '?' => 'R', '?' => 'S', '?' => 'T', '?' => 'U', '?' => 'F',
        '?' => 'H', '?' => 'C', '?' => 'CH', '?' => 'SH', '?' => 'SCH', '?' => '',
        '?' => 'Y', '?' => '', '?' => 'E', '?' => 'YU', '?' => 'YA', '?' => 'a',
        '?' => 'b', '?' => 'v', '?' => 'g', '?' => 'd', '?' => 'e', '?' => 'yo',
        '?' => 'zh', '?' => 'z', '?' => 'i', '?' => 'y', '?' => 'k', '?' => 'l',
        '?' => 'm', '?' => 'n', '?' => 'o', '?' => 'p', '?' => 'r', '?' => 's',
        '?' => 't', '?' => 'u', '?' => 'f', '?' => 'h', '?' => 'c', '?' => 'ch',
        '?' => 'sh', '?' => 'sch', '?' => '', '?' => 'y', '?' => '', '?' => 'e',
        '?' => 'yu', '?' => 'ya'
    ];

    // make a human readable string
    $text = strtr($text, $replace);

    // replace non letter or digits by -
    $text = preg_replace('~[^\pL\d.]+~u', '-', $text);

    // trim
    $text = trim($text, '-');

    // remove unwanted characters
    $text = preg_replace('~[^-\w.]+~', '', $text);

    return strtolower($text);
}

// Fourth test
// /sf/answers/206886501/
function slugagain($string){
    
    $table = [
        'Š'=>'S', 'š'=>'s', '?'=>'Dj', '?'=>'dj', 'Ž'=>'Z', 'ž'=>'z', '?'=>'C', '?'=>'c', '?'=>'C', '?'=>'c',
        'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
        'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
        'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
        'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
        'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
        'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
        'ÿ'=>'y', '?'=>'R', '?'=>'r', ' '=>'-'
    ];

    return strtr($string, $table);
}

// Fifth test
// /sf/answers/1917776311/
function slugifybis($url){
    $url = trim($url);

    $url = str_replace(' ', '-', $url);
    $url = str_replace('/', '-slash-', $url);
    
    return rawurlencode($url);
}

// Sixth and last test
// /sf/answers/2760942411/
setlocale( LC_ALL, "en_US.UTF8" );  
function slugifyagain($string){
    
    $string = iconv('utf-8', 'us-ascii//translit//ignore', $string); // transliterate
    $string = str_replace("'", '', $string);
    $string = preg_replace('~[^\pL\d]+~u', '-', $string); // replace non letter or non digits by "-"
    $string = preg_replace('~[^-\w]+~', '', $string); // remove unwanted characters
    $string = preg_replace('~-+~', '-', $string); // remove duplicate "-"
    $string = trim($string, '-'); // trim "-"
    $string = trim($string); // trim
    $string = mb_strtolower($string, 'utf-8'); // lowercase
    
    return urlencode($string); // safe;
};

$string = $newString = "¿ Àñdréß l'affreux ?arçon & nøël en forêt !";

$max = 10000;

echo '<pre>';
echo 'Beginning :';
echo '<br />';
echo '<br />';    
echo '> Slugging '.$max.' iterations of following :';
echo '<br />';
echo '>> ' . $string;
echo '<br />';  
echo '<br />';
echo 'Output results :';
echo '<br />';
echo '<br />';  

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugify($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> First test passed in **' . round($time, 2) . 'ms**';
echo '<br />';  
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slug($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Second test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugbis($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Third test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugagain($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Fourth test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugifybis($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Fifth test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){
    
    $newString = slugifyagain($string);
}

$time = (microtime(true) - $start) * 1000;

echo '> Sixth test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '</pre>';

Run Code Online (Sandbox Code Playgroud)

开始：

对以下内容进行 10000 次迭代：

¿ Àñdréß l'affreux ?arçon & nøël en forêt !

输出结果：

第一次测试在120.78 毫秒内通过

结果：-iquest-andresz-laffreux-arcon-and-noel-en-foret-

第二次测试在3883.82 毫秒内通过

结果：-andreß-laffreux-garcon--nøel-en-foret-

第三次测试在56.83 毫秒内通过

结果 : andress-l-affreux-garcon-noel-en-foret

第四次测试在18.93 毫秒内通过

结果： ¿-AndreSs-l'affreux-?arcon-&-noel-en-foret-！

5 次测试在6.45 毫秒内通过

结果：%C2%BF-%C3%80%C3%B1dr%C3%A9%C3%9F-l%27affreux-%C4%9Far%C3%A7on-%26-n%C3%B8%C3%ABl- en-for%C3%AAt-%21

在112.42 毫秒内通过了第六次测试

结果 : andress-laffreux-garcon-n-el-en-foret

需要进一步的测试。

编辑：更少的迭代测试

开始：

对以下内容进行 100 次迭代：

¿ Àñdréß l'affreux ?arçon & nøël en forêt !

输出结果：

第一次测试在1.72 毫秒内通过

结果：-iquest-andresz-laffreux-arcon-and-noel-en-foret-

第二次测试在48.59 毫秒内通过

结果：-andreß-laffreux-garcon--nøel-en-foret-

第三次测试在0.91 毫秒内通过

结果 : andress-l-affreux-garcon-noel-en-foret

第四次测试在0.3ms 内通过

结果： ¿-AndreSs-l'affreux-?arcon-&-noel-en-foret-！

第五次测试在0.14 毫秒内通过

结果：%C2%BF-%C3%80%C3%B1dr%C3%A9%C3%9F-l%27affreux-%C4%9Far%C3%A7on-%26-n%C3%B8%C3%ABl- en-for%C3%AAt-%21

在1.4 毫秒内通过了第六次测试

结果 : andress-laffreux-garcon-n-el-en-foret

归档时间：	15 年，6 月前
查看次数：	210085 次
最近记录：	6 年，2 月前