合并两个正则表达式来截断字符串中的单词

Ali*_*xel 7 php regex string truncate multibyte

我试图提出以下函数将字符串截断为整个单词(如果可能,否则它应截断为字符):

function Text_Truncate($string, $limit, $more = '...')
{
    $string = trim(html_entity_decode($string, ENT_QUOTES, 'UTF-8'));

    if (strlen(utf8_decode($string)) > $limit)
    {
        $string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)~su', '$1', $string);

        if (strlen(utf8_decode($string)) > $limit)
        {
            $string = preg_replace('~^(.{' . intval($limit) . '}).*~su', '$1', $string);
        }

        $string .= $more;
    }

    return trim(htmlentities($string, ENT_QUOTES, 'UTF-8', true));
}
Run Code Online (Sandbox Code Playgroud)

以下是一些测试:

// Iñtërnâtiônàlizætiøn and then the quick brown fox... (49 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn and then the quick brown fox jumped overly the lazy dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

// Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_...  (50 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');
Run Code Online (Sandbox Code Playgroud)

它们都按原样工作,但如果我放下第二个,preg_replace()我得到以下内容:

Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog有一天,这只懒狗将这只可怜的狐狸驼得一团糟,直到她去世为止......

我无法使用,substr()因为它只能在字节级别上工作,而且我无法访问mb_substr()ATM,我已经多次尝试将第二个正则表达式加入到第一个正则表达式但没有成功.

请帮助短信,我一直在努力这一近一个小时.


编辑:对不起,我已经醒了40个小时,我无耻地错过了这个:

$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)?~su', '$1', $string);
Run Code Online (Sandbox Code Playgroud)

尽管如此,如果某人有更优化的正则表达式(或忽略尾随空格的正则表达式),请分享:

"Iñtërnâtiônàlizætiøn and then "
"Iñtërnâtiônàlizætiøn_and_then_"
Run Code Online (Sandbox Code Playgroud)

编辑2:我仍然无法摆脱拖尾的空白,有人可以帮助我吗?

编辑3:好的,我的编辑都没有真正起作用,我被RegexBuddy愚弄了 - 我应该把它留到另一天,现在睡一觉.今天关闭.

gna*_*arf 3

也许我可以在经历了一整夜的正则表达式噩梦后给你一个快乐的早晨:

'~^(.{1,' . intval($limit) . '}(?<=\S)(?=\s)|.{'.intval($limit).'}).*~su'
Run Code Online (Sandbox Code Playgroud)

将其煮沸:

^      # Start of String
(       # begin capture group 1
 .{1,x} # match 1 - x characters
 (?<=\S)# lookbehind, match must end with non-whitespace 
 (?=\s) # lookahead, if the next char is whitespace, match
 |      # otherwise test this:
 .{x}   # got to x chars anyway.
)       # end cap group
.*     # match the rest of the string (since you were using replace)
Run Code Online (Sandbox Code Playgroud)

您始终可以将 添加到|$的末尾(?=\s),但由于您的代码已经检查字符串长度是否比 长$limit,所以我认为这种情况没有必要。