bas*_*ero 5 java lucene random algorithm shuffle
默认情况下,Lucene按相关性(得分)的顺序返回查询结果.您可以传递排序字段(或多个),然后结果按该字段排序.
我现在正在寻找一个很好的解决方案,以随机顺序获取搜索结果.
糟糕的方法:
当然我可以采取所有结果,然后洗牌收集,但在5 Mio搜索结果的情况下,这表现不佳.
优雅的分页方法:
通过这种方法,您可以告诉Lucene以下内容:
a)给我结果1到10个5Mio结果是随机顺序
b)然后给我11到20(基于相同的随机序列用于一个).
c)只是澄清:如果你打电话给a)两次,你会得到相同的随机元素.
你怎么能实现这种方法?
2012年7月27日更新:请注意,此处针对Lucene 2.9.x 所述的解决方案无法正常运行.使用RandomOrderScoreDocComparator将导致在结果列表中两次具有某些结果.
你可以写一个自定义FieldComparator:
public class RandomOrderFieldComparator extends FieldComparator<Integer> {
private final Random random = new Random();
@Override
public int compare(int slot1, int slot2) {
return random.nextInt();
}
@Override
public int compareBottom(int doc) throws IOException {
return random.nextInt();
}
@Override
public void copy(int slot, int doc) throws IOException {
}
@Override
public void setBottom(int bottom) {
}
@Override
public void setNextReader(IndexReader reader, int docBase) throws IOException {
}
@Override
public Integer value(int slot) {
return random.nextInt();
}
}
Run Code Online (Sandbox Code Playgroud)
在洗牌结果时,这不会消耗任何I/O. 这是我的示例程序,演示了如何使用它:
public static void main(String... args) throws Exception {
RAMDirectory directory = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_33);
IndexWriter writer = new IndexWriter(
directory,
new IndexWriterConfig(Version.LUCENE_33, analyzer).setOpenMode(OpenMode.CREATE_OR_APPEND)
);
Document alice = new Document();
alice.add( new Field("name", "Alice", Field.Store.YES, Field.Index.ANALYZED) );
writer.addDocument( alice );
Document bob = new Document();
bob.add( new Field("name", "Bob", Field.Store.YES, Field.Index.ANALYZED) );
writer.addDocument( bob );
Document chris = new Document();
chris.add( new Field("name", "Chris", Field.Store.YES, Field.Index.ANALYZED) );
writer.addDocument( chris );
writer.close();
IndexSearcher searcher = new IndexSearcher( directory );
for (int pass = 1; pass <= 10; pass++) {
Query query = new MatchAllDocsQuery();
Sort sort = new Sort(
new SortField(
"",
new FieldComparatorSource() {
@Override
public FieldComparator<Integer> newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
return new RandomOrderFieldComparator();
}
}
)
);
TopFieldDocs topFieldDocs = searcher.search( query, 10, sort );
System.out.print("Pass #" + pass + ":");
for (int i = 0; i < topFieldDocs.totalHits; i++) {
System.out.print( " " + topFieldDocs.scoreDocs[i].doc );
}
System.out.println();
}
}
Run Code Online (Sandbox Code Playgroud)
它产生了这个输出:
Pass #1: 1 0 2 Pass #2: 1 0 2 Pass #3: 0 1 2 Pass #4: 0 1 2 Pass #5: 0 1 2 Pass #6: 1 0 2 Pass #7: 0 2 1 Pass #8: 1 2 0 Pass #9: 2 0 1 Pass #10: 0 2 1
public class RandomOrderScoreDocComparator implements ScoreDocComparator {
private final Random random = new Random();
public int compare(ScoreDoc i, ScoreDoc j) {
return random.nextInt();
}
public Comparable<?> sortValue(ScoreDoc i) {
return Integer.valueOf( random.nextInt() );
}
public int sortType() {
return SortField.CUSTOM;
}
}
Run Code Online (Sandbox Code Playgroud)
所有你必须改变的是Sort对象:
Sort sort = new Sort(
new SortField(
"",
new SortComparatorSource() {
public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
return new RandomOrderScoreDocComparator();
}
}
)
);
Run Code Online (Sandbox Code Playgroud)