Lucene 2.9.2:如何以随机顺序显示结果?

bas*_*ero 5 java lucene random algorithm shuffle

默认情况下,Lucene按相关性(得分)的顺序返回查询结果.您可以传递排序字段(或多个),然后结果按该字段排序.

我现在正在寻找一个很好的解决方案,以随机顺序获取搜索结果.

糟糕的方法:
当然我可以采取所有结果,然后洗牌收集,但在5 Mio搜索结果的情况下,这表现不佳.

优雅的分页方法:
通过这种方法,您可以告诉Lucene以下内容:
a)给我结果1到10个5Mio结果是随机顺序
b)然后给我11到20(基于相同的随机序列用于一个).
c)只是澄清:如果你打电话给a)两次,你会得到相同的随机元素.

你怎么能实现这种方法?


2012年7月27日更新:请注意,此处针对Lucene 2.9.x 所述的解决方案无法正常运行.使用RandomOrderScoreDocComparator将导致在结果列表中两次具有某些结果.

Ada*_*ter 7

你可以写一个自定义FieldComparator:

public class RandomOrderFieldComparator extends FieldComparator<Integer> {

    private final Random random = new Random();

    @Override
    public int compare(int slot1, int slot2) {
        return random.nextInt();
    }

    @Override
    public int compareBottom(int doc) throws IOException {
        return random.nextInt();
    }

    @Override
    public void copy(int slot, int doc) throws IOException {
    }

    @Override
    public void setBottom(int bottom) {
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase) throws IOException {
    }

    @Override
    public Integer value(int slot) {
        return random.nextInt();
    }

}
Run Code Online (Sandbox Code Playgroud)

在洗牌结果时,这不会消耗任何I/O. 这是我的示例程序,演示了如何使用它:

public static void main(String... args) throws Exception {
    RAMDirectory directory = new RAMDirectory();

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_33);

    IndexWriter writer = new IndexWriter(
            directory,
            new IndexWriterConfig(Version.LUCENE_33, analyzer).setOpenMode(OpenMode.CREATE_OR_APPEND)
        );

    Document alice = new Document();
    alice.add( new Field("name", "Alice", Field.Store.YES, Field.Index.ANALYZED) );
    writer.addDocument( alice );

    Document bob = new Document();
    bob.add( new Field("name", "Bob", Field.Store.YES, Field.Index.ANALYZED) );
    writer.addDocument( bob );

    Document chris = new Document();
    chris.add( new Field("name", "Chris", Field.Store.YES, Field.Index.ANALYZED) );
    writer.addDocument( chris );

    writer.close();


    IndexSearcher searcher = new IndexSearcher( directory );



    for (int pass = 1; pass <= 10; pass++) {
        Query query = new MatchAllDocsQuery();

        Sort sort = new Sort(
                new SortField(
                        "",
                        new FieldComparatorSource() {

                            @Override
                            public FieldComparator<Integer> newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
                                return new RandomOrderFieldComparator();
                            }

                        }
                    )
            );

        TopFieldDocs topFieldDocs = searcher.search( query, 10, sort );

        System.out.print("Pass #" + pass + ":");
        for (int i = 0; i < topFieldDocs.totalHits; i++) {
            System.out.print( " " + topFieldDocs.scoreDocs[i].doc );
        }
        System.out.println();
    }
}
Run Code Online (Sandbox Code Playgroud)

它产生了这个输出:

Pass #1: 1 0 2
Pass #2: 1 0 2
Pass #3: 0 1 2
Pass #4: 0 1 2
Pass #5: 0 1 2
Pass #6: 1 0 2
Pass #7: 0 2 1
Pass #8: 1 2 0
Pass #9: 2 0 1
Pass #10: 0 2 1

奖金!对于那些被困在Lucene 2中的人

public class RandomOrderScoreDocComparator implements ScoreDocComparator {

    private final Random random = new Random();

    public int compare(ScoreDoc i, ScoreDoc j) {
        return random.nextInt();
    }

    public Comparable<?> sortValue(ScoreDoc i) {
        return Integer.valueOf( random.nextInt() );
    }

    public int sortType() {
        return SortField.CUSTOM;
    }

}
Run Code Online (Sandbox Code Playgroud)

所有你必须改变的是Sort对象:

Sort sort = new Sort(
    new SortField(
        "",
        new SortComparatorSource() {
            public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
                return new RandomOrderScoreDocComparator();
            }
        }
    )
);
Run Code Online (Sandbox Code Playgroud)