Blu*_*irl 15 java lucene search-engine
我想用Lucene来计算Precision和Recall.
我做了这些步骤:
制作了一些索引文件.为此,我使用.txt
了此路径中存在的索引器代码和索引文件C:/inn
(此文件夹中有4个文本文件),并通过C:/outt
在索引器代码中设置索引路径将它们带入"outt"文件夹.
创建了一个名为包lia.benchmark
和它里面的一类被称为"PrecisionRecall",并添加externaljars
(右击- > Java构建路径- >添加外部罐)中,加入Lucene-benchmark-.3.2.0jar
与Lucene-core-3.3.0jar
将topicsfile
代码中的路径设置为to to to C:/lia2e/src/lia/benchmark/topics.txt
,
qrelsfile
将C:/lia2e/src/lia/benchmark/qrels.txt
dir设置为"C:/ outt".
这是代码:
package lia.benchmark;
import java.io.File;
import java.io.PrintWriter;
import java.io.BufferedReader;
import java.io.FileReader;
import org.apache.lucene.search.*;
import org.apache.lucene.store.*;
import org.apache.lucene.benchmark.quality.*;
import org.apache.lucene.benchmark.quality.utils.*;
import org.apache.lucene.benchmark.quality.trec.*;
public class PrecisionRecall {
public static void main(String[] args) throws Throwable {
File topicsFile = new File("C:/lia2e/src/lia/benchmark/topics.txt");
File qrelsFile = new File("C:/lia2e/src/lia/benchmark/qrels.txt");
Directory dir = FSDirectory.open(new File("C:/outt"));
IndexSearcher searcher = new IndexSearcher(dir, true);
String docNameField = "filename";
PrintWriter logger = new PrintWriter(System.out, true);
TrecTopicsReader qReader = new TrecTopicsReader();
QualityQuery qqs[] = qReader.readQueries(
new BufferedReader(new FileReader(topicsFile)));
Judge judge = new TrecJudge(new BufferedReader(
new FileReader(qrelsFile)));
judge.validateData(qqs, logger);
QualityQueryParser qqParser = new SimpleQQParser("title", "contents");
QualityBenchmark qrun = new QualityBenchmark(qqs, qqParser, searcher, docNameField);
SubmissionReport submitLog = null;
QualityStats stats[] = qrun.execute(judge,
submitLog, logger);
QualityStats avg = QualityStats.average(stats);
avg.log("SUMMARY",2,logger, " ");
dir.close();
}
}
Run Code Online (Sandbox Code Playgroud)初始化的qrels和主题.在文档文件夹(C:\ inn)中我有4个txt文件,其中2个与我的查询相关(查询是苹果)所以我填写了qrels和主题.
像这样的qrels文件:
<top>
<num> Number: 0
<title> apple
<desc> Description:
<narr> Narrative:
</top>
Run Code Online (Sandbox Code Playgroud)
和主题文件如下:
0 0 789.txt 1
0 0 101.txt 1
Run Code Online (Sandbox Code Playgroud)
我也尝试了Path格式,例如"C:\ inn\789.txt"而不是"789.txt",但结果为零:
0 - contents:apple
0 Stats:
Search Seconds: 0.016
DocName Seconds: 0.000
Num Points: 2.000
Num Good Points: 0.000
Max Good Points: 2.000
Average Precision: 0.000
MRR: 0.000
Recall: 0.000
Precision At 1: 0.000
SUMMARY
Search Seconds: 0.016
DocName Seconds: 0.000
Num Points: 2.000
Num Good Points: 0.000
Max Good Points: 2.000
Average Precision: 0.000
MRR: 0.000
Recall: 0.000
Precision At 1: 0.000
Run Code Online (Sandbox Code Playgroud)你能告诉我我有什么问题吗?
我真的需要知道为什么结果为零.
恐怕qrels.txt
格式错误:javadoc建议如下:
预期输入格式:
qnum 0 doc-name is-relevant
Run Code Online (Sandbox Code Playgroud)
两条样本线:
19 0 doc303 1
19 0 doc7295 0
Run Code Online (Sandbox Code Playgroud)
(我知道它是2.3.0 javadoc,但格式在3.0中没有改变)
所以看来您已经交换了文件:TrecTopicsReader
期望您拥有的内容qrels.txt
;TrecJudge
期待您在 中拥有的内容topics.txt
。