D3V*_*PER 8 php arrays algorithm numbers
排列 (3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 65, 4, 7, 13, 32)
频繁的数字序列将是 (3, 5) f=2 + (4, 7, 13) f=2
任何算法或伪代码来找到?
更新(1):
如果(7, 13)也发生它将通过更新其频率而被包括在最长的那个中
(4, 7, 13) f=3 等等...
更新(2):
如果(1,2,3,4,1,2,3,4,1,2,7,8,7,8,3,4,3,4,1,2)输出应该是(1,2,3,4)&(3,4,1,2)
&(7,8),明确地将每个数字视为一个单词,并且您希望找到最常用的短语
所以在很多短语中看到相同的单词是很常见的,但是如果任何短语是任何其他短语的子串
短语不应被视为短语,而是将更新每个短语的频率包括它
**编辑**:稍微好一点的实现,现在也返回频率,并有一个更好的序列过滤器.
function getFrequences($input, $minimalSequenceSize = 2) {
$sequences = array();
$frequences = array();
$len = count($input);
for ($i=0; $i<$len; $i++) {
$offset = $i;
for ($j=$i+$minimalSequenceSize; $j<$len; $j++) {
if ($input[$offset] == $input[$j]) {
$sequenceSize = 1;
$sequence = array($input[$offset]);
while (($offset + $sequenceSize < $j)
&& ($input[$offset+$sequenceSize] == $input[$j+$sequenceSize])) {
if (false !== ($seqIndex = array_search($sequence, $frequences))) {
// we already have this sequence, since we found a bigger one, remove the old one
array_splice($sequences, $seqIndex, 1);
array_splice($frequences, $seqIndex, 1);
}
$sequence[] = $input[$offset+$sequenceSize];
$sequenceSize++;
}
if ($sequenceSize >= $minimalSequenceSize) {
if (false !== ($seqIndex = array_search($sequence, $sequences))) {
$frequences[$seqIndex]++;
} else {
$sequences[] = $sequence;
$frequences[] = 2; // we have two occurances already
}
// $i += $sequenceSize; // move $i so we don't reuse the same sub-sequence
break;
}
}
}
}
// remove sequences that are sub-sequence of another frequence
// ** comment this to keep all sequences regardless **
$len = count($sequences);
for ($i=0; $i<$len; $i++) {
$freq_i = $sequences[$i];
for ($j=$i+1; $j<$len; $j++) {
$freq_j = $sequences[$j];
$freq_inter = array_intersect($freq_i, $freq_j);
if (count($freq_inter) != 0) {
$len--;
if (count($freq_i) > count($freq_j)) {
array_splice($sequences, $j, 1);
array_splice($frequences, $j, 1);
$j--;
} else {
array_splice($sequences, $i, 1);
array_splice($frequences, $i, 1);
$i--;
break;
}
}
}
}
return array($sequences, $frequences);
};
Run Code Online (Sandbox Code Playgroud)
测试用例
header('Content-type: text/plain');
$input = array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13);
list($sequences, $frequences) = getFrequences($input);
foreach ($sequences as $i => $s) {
echo "(" . implode(',', $s) . ') f=' . $frequences[$i] . "\n";
}
Run Code Online (Sandbox Code Playgroud)
**编辑**:这是对该功能的更新.它几乎完全重写了......告诉我这是不是你要找的东西.我还添加了冗余检查,以防止计算相同的序列或子序列两次.
function getFrequences2($input, $minSequenceSize = 2) {
$sequences = array();
$last_offset = 0;
$last_offset_len = 0;
$len = count($input);
for ($i=0; $i<$len; $i++) {
for ($j=$i+$minSequenceSize; $j<$len; $j++) {
if ($input[$i] == $input[$j]) {
$offset = 1;
$sub = array($input[$i]);
while ($i + $offset < $j && $j + $offset < $len) {
if ($input[$i + $offset] == $input[$j + $offset]) {
array_push($sub, $input[$i + $offset]);
} else {
break;
}
$offset++;
}
$sub_len = count($sub);
if ($sub_len >= $minSequenceSize) {
// $sub must contain more elements than the last sequence found
// otherwise we will count the same sequence twice
if ($last_offset + $last_offset_len >= $i + $sub_len) {
// we already saw this sequence... ignore
continue;
} else {
// save offset and sub_len for future check
$last_offset = $i;
$last_offset_len = $sub_len;
}
foreach ($sequences as & $sequence) {
$sequence_len = count($sequence['values']);
if ($sequence_len == $sub_len && $sequence['values'] == $sub) {
//echo "Found add-full ".var_export($sub, true)." at $i and $j...\n";
$sequence['frequence']++;
break 2;
} else {
if ($sequence_len > $sub_len) {
$end = $sequence_len - $sub_len;
$values = $sequence['values'];
$slice_len = $sub_len;
$test = $sub;
} else {
$end = $sub_len - $sequence_len;
$values = $sub;
$slice_len = $sequence_len;
$test = $sequence['values'];
}
for ($k=0; $k<=$end; $k++) {
if (array_slice($values, $k, $slice_len) == $test) {
//echo "Found add-part ".implode(',',$sub)." which is part of ".implode(',',$values)." at $i and $j...\n";
$sequence['values'] = $values;
$sequence['frequence']++;
break 3;
}
}
}
}
//echo "Found new ".implode(',',$sub)." at $i and $j...\n";
array_push($sequences, array('values' => $sub, 'frequence' => 2));
break;
}
}
}
}
return $sequences;
};
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2334 次 |
| 最近记录: |