我有一个数组:
$myArray=array(
'hello my name is richard',
'hello my name is paul',
'hello my name is simon',
'hello it doesn\'t matter what my name is'
);
Run Code Online (Sandbox Code Playgroud)
我需要找到最常重复的子字符串(最少2个字),也许是数组格式,所以我的返回数组可能如下所示:
$return=array(
array('hello my', 3),
array('hello my name', 3),
array('hello my name is', 3),
array('my name', 4),
array('my name is', 4),
array('name is', 4),
);
Run Code Online (Sandbox Code Playgroud)
所以我可以从这个数组中看到每个字符串在数组中的所有字符串中重复的频率.
是这样做的唯一方法吗?..
function repeatedSubStrings($array){
foreach($array as $string){
$phrases=//Split each string into maximum number of sub strings
foreach($phrases as $phrase){
//Then count the $phrases that are in the strings
}
}
}
Run Code Online (Sandbox Code Playgroud)
我尝试过类似上面的解决方案,但速度太慢,每秒处理大约1000行,任何人都可以更快地完成它吗?
一个解决方案可能是
function getHighestRecurrence($strs){
/*Storage for individual words*/
$words = Array();
/*Process multiple strings*/
if(is_array($strs))
foreach($strs as $str)
$words = array_merge($words, explode(" ", $str));
/*Prepare single string*/
else
$words = explode(" ",$strs);
/*Array for word counters*/
$index = Array();
/*Aggregate word counters*/
foreach($words as $word)
/*Increment count or create if it doesn't exist*/
(isset($index[$word]))? $index[$word]++ : $index[$word] = 1;
/*Sort array hy highest value and */
arsort($index);
/*Return the word*/
return key($index);
}
Run Code Online (Sandbox Code Playgroud)