我做了下一个函数,可以从文本中返回特定数量的单词:
function brief_text($text, $num_words = 50) {
$words = str_word_count($text, 1);
$required_words = array_slice($words, 0, $num_words);
return implode(" ", $required_words);
}
Run Code Online (Sandbox Code Playgroud)
并且它在英语中也能很好地工作,但是当我尝试在阿拉伯语中使用它时,它会失败并且不会返回预期的单词。例如:
$text_en = "Cairo is the capital of Egypt and Paris is the capital of France";
echo brief_text($text_en, 10);
Run Code Online (Sandbox Code Playgroud)
将Cairo is the capital of Egypt and Paris is the
在
$text_ar = "??????? ?? ????? ??? ?????? ?? ????? ?????";
echo brief_text($text_ar, 10);
Run Code Online (Sandbox Code Playgroud)
将输出? ? ? ? ? ? ? ? ? ?
。
我知道问题出在str_word_count
函数上,但我不知道如何解决。
更新
我已经写了另一个可以同时使用英语和阿拉伯语的功能,但是我正在寻找一种解决因str_word_count()
阿拉伯语而引起的功能问题的解决方案。无论如何,这是我的另一个功能:
function brief_text($string, $number_of_required_words = 50) {
$string = trim(preg_replace('/\s+/', ' ', $string));
$words = explode(" ", $string);
$required_words = array_slice($words, 0, $number_of_required_words); // get sepecific number of elements from the array
return implode(" ", $required_words);
}
Run Code Online (Sandbox Code Playgroud)
尝试使用此函数进行字数统计:
\n\n// You can call the function as you like\nif (!function_exists(\'mb_str_word_count\'))\n{\n function mb_str_word_count($string, $format = 0, $charlist = \'[]\') {\n mb_internal_encoding( \'UTF-8\');\n mb_regex_encoding( \'UTF-8\');\n\n $words = mb_split(\'[^\\x{0600}-\\x{06FF}]\', $string);\n switch ($format) {\n case 0:\n return count($words);\n break;\n case 1:\n case 2:\n return $words;\n break;\n default:\n return $words;\n break;\n }\n };\n}\n\n\n\necho mb_str_word_count("\xd8\xa7\xd9\x84\xd9\x82\xd8\xa7\xd9\x87\xd8\xb1\xd8\xa9 \xd9\x87\xd9\x89 \xd8\xb9\xd8\xa7\xd8\xb5\xd9\x85\xd8\xa9 \xd9\x85\xd8\xb5\xd8\xb1 \xd9\x88\xd8\xa8\xd8\xa7\xd8\xb1\xd9\x8a\xd8\xb3 \xd9\x87\xd9\x89 \xd8\xb9\xd8\xa7\xd8\xb5\xd9\x85\xd8\xa9 \xd9\x81\xd8\xb1\xd9\x86\xd8\xb3\xd8\xa7") . PHP_EOL;\n
Run Code Online (Sandbox Code Playgroud)\n\n<meta charset="UTF-8"/>
在 HTML 文件中使用标签Content-type: text/html; charset=utf-8
标题