far*_*oft 21 php string search text
如何使用PHP搜索文本?
就像是:
<?php
$text = "Hello World!";
if ($text contains "World") {
echo "True";
}
?>
Run Code Online (Sandbox Code Playgroud)
除了替换if ($text contains "World") {工作条件.
Bol*_*ock 42
在您的情况下,您可以使用strpos(),或stripos()用于不区分大小写的搜索:
if (stripos($text, "world") !== false) {
echo "True";
}
Run Code Online (Sandbox Code Playgroud)
你需要的是strstr()(或者stristr()像LucaB指出的那样).像这样使用它:
if(strstr($text, "world")) {/* do stuff */}
Run Code Online (Sandbox Code Playgroud)
如果您正在寻找一种基于多个词的相关性对搜索结果进行排名的算法,这里提供了一种仅使用 PHP 生成搜索结果的快速简便方法。
PHP中向量空间模型的实现
function get_corpus_index($corpus = array(), $separator=' ') {
$dictionary = array();
$doc_count = array();
foreach($corpus as $doc_id => $doc) {
$terms = explode($separator, $doc);
$doc_count[$doc_id] = count($terms);
// tf–idf, short for term frequency–inverse document frequency,
// according to wikipedia is a numerical statistic that is intended to reflect
// how important a word is to a document in a corpus
foreach($terms as $term) {
if(!isset($dictionary[$term])) {
$dictionary[$term] = array('document_frequency' => 0, 'postings' => array());
}
if(!isset($dictionary[$term]['postings'][$doc_id])) {
$dictionary[$term]['document_frequency']++;
$dictionary[$term]['postings'][$doc_id] = array('term_frequency' => 0);
}
$dictionary[$term]['postings'][$doc_id]['term_frequency']++;
}
//from http://phpir.com/simple-search-the-vector-space-model/
}
return array('doc_count' => $doc_count, 'dictionary' => $dictionary);
}
function get_similar_documents($query='', $corpus=array(), $separator=' '){
$similar_documents=array();
if($query!=''&&!empty($corpus)){
$words=explode($separator,$query);
$corpus=get_corpus_index($corpus);
$doc_count=count($corpus['doc_count']);
foreach($words as $word) {
$entry = $corpus['dictionary'][$word];
foreach($entry['postings'] as $doc_id => $posting) {
//get term frequency–inverse document frequency
$score=$posting['term_frequency'] * log($doc_count + 1 / $entry['document_frequency'] + 1, 2);
if(isset($similar_documents[$doc_id])){
$similar_documents[$doc_id]+=$score;
}
else{
$similar_documents[$doc_id]=$score;
}
}
}
// length normalise
foreach($similar_documents as $doc_id => $score) {
$similar_documents[$doc_id] = $score/$corpus['doc_count'][$doc_id];
}
// sort fro high to low
arsort($similar_documents);
}
return $similar_documents;
}
Run Code Online (Sandbox Code Playgroud)
在你的情况下
$query = 'world';
$corpus = array(
1 => 'hello world',
);
$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
print_r($match_results);
echo '</pre>';
Run Code Online (Sandbox Code Playgroud)
结果
Array
(
[1] => 0.79248125036058
)
Run Code Online (Sandbox Code Playgroud)
针对多个短语匹配多个单词
$query = 'hello world';
$corpus = array(
1 => 'hello world how are you today?',
2 => 'how do you do world',
3 => 'hello, here you are! how are you? Are we done yet?'
);
$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
print_r($match_results);
echo '</pre>';
Run Code Online (Sandbox Code Playgroud)
结果
Array
(
[1] => 0.74864218272161
[2] => 0.43398500028846
)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
91747 次 |
| 最近记录: |