sil*_*npi 5 php regex string sanitization url-rewriting
我们需要从书的标题中生成一个唯一的URL - 标题可以包含任何字符.我们如何搜索 - 替换所有"无效"字符,以便生成有效和整洁的查找URL?
例如:
"The Great Book of PHP"
www.mysite.com/book/12345/the-great-book-of-php
"The Greatest !@#$ Book of PHP"
www.mysite.com/book/12345/the-greatest-book-of-php
"Funny title "
www.mysite.com/book/12345/funny-title
Run Code Online (Sandbox Code Playgroud)
Mez*_*Mez 15
啊,重击
// This function expects the input to be UTF-8 encoded.
function slugify($text)
{
// Swap out Non "Letters" with a -
$text = preg_replace('/[^\\pL\d]+/u', '-', $text);
// Trim out extra -'s
$text = trim($text, '-');
// Convert letters that we have left to the closest ASCII representation
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
// Make text lowercase
$text = strtolower($text);
// Strip out anything we haven't been able to convert
$text = preg_replace('/[^-\w]+/', '', $text);
return $text;
}
Run Code Online (Sandbox Code Playgroud)
这很好用,因为它首先使用每个字符的unicode属性来确定它是否是一个字母(或者对数字是\n) - 然后它转换那些不是-s的那些 - 然后它音译为ascii,是另一个替代品,然后清理自己.(Fabrik的测试返回"arvizturo-tukorfurogep")
我还倾向于添加一个停用词列表 - 这样就可以从slug中删除它们."the""of"或"""a"等等(但不要长篇大论,或者你删除像"php"这样的东西)
如果"无效"表示非字母数字,则可以执行以下操作:
function foo($str) {
return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($str)), '-');
}
Run Code Online (Sandbox Code Playgroud)
这将$str变成小写,用一个连字符替换一个或多个非字母数字字符的任何序列,然后删除前导和尾随连字符.
var_dump(foo("The Great Book of PHP") === 'the-great-book-of-php');
var_dump(foo("The Greatest !@#$ Book of PHP") === 'the-greatest-book-of-php');
var_dump(foo("Funny title ") === 'funny-title');
Run Code Online (Sandbox Code Playgroud)