在lucene java中进行精确调用

Blu*_*irl 15 java lucene search-engine

我想用Lucene来计算Precision和Recall.

我做了这些步骤:

  1. 制作了一些索引文件.为此,我使用.txt了此路径中存在的索引器代码和索引文件C:/inn(此文件夹中有4个文本文件),并通过C:/outt在索引器代码中设置索引路径将它们带入"outt"文件夹.

  2. 创建了一个名为包lia.benchmark和它里面的一类被称为"PrecisionRecall",并添加externaljars(右击- > Java构建路径- >添加外部罐)中,加入Lucene-benchmark-.3.2.0jarLucene-core-3.3.0jar

  3. topicsfile代码中的路径设置为to to to C:/lia2e/src/lia/benchmark/topics.txt,
    qrelsfileC:/lia2e/src/lia/benchmark/qrels.txtdir设置为"C:/ outt".

    这是代码:

    package lia.benchmark;        
    import java.io.File;  
    import java.io.PrintWriter;  
    import java.io.BufferedReader;  
    import java.io.FileReader;  
    import org.apache.lucene.search.*;  
    import org.apache.lucene.store.*;  
    import org.apache.lucene.benchmark.quality.*;  
    import org.apache.lucene.benchmark.quality.utils.*;  
    import org.apache.lucene.benchmark.quality.trec.*;  
    
     public class PrecisionRecall {  
    
       public static void main(String[] args) throws Throwable {  
    
      File topicsFile = new File("C:/lia2e/src/lia/benchmark/topics.txt");  
             File qrelsFile = new File("C:/lia2e/src/lia/benchmark/qrels.txt");  
             Directory dir = FSDirectory.open(new File("C:/outt"));  
             IndexSearcher searcher = new IndexSearcher(dir, true);  
    
             String docNameField = "filename";  
    
             PrintWriter logger = new PrintWriter(System.out, true);  
    
             TrecTopicsReader qReader = new TrecTopicsReader();   
             QualityQuery qqs[] = qReader.readQueries(                        
                     new BufferedReader(new FileReader(topicsFile)));  
    
             Judge judge = new TrecJudge(new BufferedReader(          
                    new FileReader(qrelsFile)));                                          
    
             judge.validateData(qqs, logger);                                          
    
             QualityQueryParser qqParser = new SimpleQQParser("title", "contents");  
    
             QualityBenchmark qrun = new QualityBenchmark(qqs, qqParser, searcher, docNameField);  
       SubmissionReport submitLog = null;  
             QualityStats stats[] = qrun.execute(judge,                   
                       submitLog, logger);  
    
            QualityStats avg = QualityStats.average(stats);          
            avg.log("SUMMARY",2,logger, "  ");  
       dir.close();  
      }  
    } 
    
    Run Code Online (Sandbox Code Playgroud)
  4. 初始化的qrels和主题.在文档文件夹(C:\ inn)中我有4个txt文件,其中2个与我的查询相关(查询是苹果)所以我填写了qrels和主题.

    像这样的qrels文件:

    <top>  
        <num> Number: 0 
        <title> apple
        <desc> Description:  
        <narr> Narrative:  
    </top>  
    
    Run Code Online (Sandbox Code Playgroud)

    和主题文件如下:

    0    0      789.txt           1
    0    0      101.txt           1
    
    Run Code Online (Sandbox Code Playgroud)

    我也尝试了Path格式,例如"C:\ inn\789.txt"而不是"789.txt",但结果为零:

    0 - contents:apple
    0 Stats:
    Search Seconds: 0.016
    DocName Seconds: 0.000
    Num Points: 2.000
    Num Good Points: 0.000
    Max Good Points: 2.000
    Average Precision: 0.000
    MRR: 0.000
    Recall: 0.000
    Precision At 1: 0.000
    SUMMARY
    Search Seconds: 0.016
    DocName Seconds: 0.000
    Num Points: 2.000
    Num Good Points: 0.000
    Max Good Points: 2.000
    Average Precision: 0.000
    MRR: 0.000
    Recall: 0.000
    Precision At 1: 0.000
    
    Run Code Online (Sandbox Code Playgroud)

你能告诉我我有什么问题吗?

我真的需要知道为什么结果为零.

alf*_*alf 3

恐怕qrels.txt格式错误:javadoc建议如下:

预期输入格式:

 qnum  0   doc-name     is-relevant
Run Code Online (Sandbox Code Playgroud)

两条样本线:

 19    0   doc303       1
 19    0   doc7295      0
Run Code Online (Sandbox Code Playgroud)

(我知道它是2.3.0 javadoc,但格式在3.0中没有改变)

所以看来您已经交换了文件:TrecTopicsReader期望您拥有的内容qrels.txtTrecJudge期待您在 中拥有的内容topics.txt