多字节安全计数字符串中的不同字符

Pri*_*ome 2 php string utf-8

我不想找到一种智能有效的方法来计算一个字符串中有多少不同的字母字符.例:

$str = "APPLE";
echo char_count($str) // should return 4, because APPLE has 4 different chars 'A', 'P', 'L' and 'E'

$str = "BOB AND BOB"; // should return 5 ('B', 'O', 'A', 'N', 'D'). 

$str = 'PLÁTANO'; // should return 7 ('P', 'L', 'Á', 'T', 'A', 'N', 'O')
Run Code Online (Sandbox Code Playgroud)

它应该支持UTF-8字符串!

rod*_*ehm 11

如果你正在处理UTF-8(你真的应该考虑,imho),所有发布的解决方案(使用strlen,str_split或count_chars)都不会起作用,因为它们都将一个字节视为一个字符(这不是真的显然是UTF-8).

<?php

$treat_spaces_as_chars = true;
// contains hälöwrd and a space, being 8 distinct characters (7 without the space)
$string = "hällö wörld"; 
// remove spaces if we don't want to count them
if (!$treat_spaces_as_chars) {
  $string = preg_replace('/\s+/u', '', $string);
}
// split into characters (not bytes, like explode() or str_split() would)
$characters = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
// throw out the duplicates
$unique_characters = array_unique($characters);
// count what's left
$numer_of_characters = count($unique_characters);
Run Code Online (Sandbox Code Playgroud)

如果你想丢弃所有非单词字符:

<?php

$ignore_non_word_characters = true;
// contains hälöwrd and PIE, as this is treated as a word character (Greek)
$string = "h,ä*+l•?‘°’?lö wörld"; 
// remove spaces if we don't want to count them
if ($ignore_non_word_characters) {
  $string = preg_replace('/\W+/u', '', $string);
}
// split into characters (not bytes, like explode() or str_split() would)
$characters = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
// throw out the duplicates
$unique_characters = array_unique($characters);
// count what's left
$numer_of_characters = count($unique_characters);

var_dump($characters, $unique_characters, $numer_of_characters);
Run Code Online (Sandbox Code Playgroud)


Cli*_*ive 5

只需使用count_chars:

echo count(array_filter(count_chars($str)));
Run Code Online (Sandbox Code Playgroud)

返回的数组count_chars()还将告诉您字符串中每个字符的数量.