use*_*977 10 java lucene autocomplete search-suggestion
我是Lucene的绿手,我想实现自动建议,就像google一样,当我输入像'G'这样的字符时,它会给我一个列表,你可以尝试自己.
我在整个网上搜索过.没有人这样做,它给了我们一些新的工具包建议
但我需要一个例子告诉我该怎么做
有人可以帮忙吗?
Joh*_*man 44
我将为您提供一个非常完整的示例,向您展示如何使用AnalyzingInfixSuggester.在这个例子中,我们假装我们是亚马逊,我们想要自动完成产品搜索字段.我们将利用Lucene建议系统的功能来实现以下功能:
首先,我将定义一个简单的类来保存Product.java中有关产品的信息:
import java.util.Set;
class Product implements java.io.Serializable
{
String name;
String image;
String[] regions;
int numberSold;
public Product(String name, String image, String[] regions,
int numberSold) {
this.name = name;
this.image = image;
this.regions = regions;
this.numberSold = numberSold;
}
}
Run Code Online (Sandbox Code Playgroud)
要使用AnalyzingInfixSuggester's build方法索引记录,您需要将实现该org.apache.lucene.search.suggest.InputIterator接口的对象传递给它.可以InputIterator访问每条记录的密钥,上下文,有效负载和权重.
该关键是你真正想要搜索和禁止自动完成的文本.在我们的示例中,它将是产品的名称.
该上下文是一组的,你可以用它来筛选记录对额外的,任意的数据.在我们的示例中,上下文是我们将特定产品发送到的国家/地区的ISO代码集.
该有效载荷是要在备案索引存储更多任意数据.在这个例子中,我们实际上将序列化每个Product实例并将结果字节存储为有效负载.然后,当我们稍后进行查找时,我们可以反序列化有效负载并访问产品实例中的信息,如图像URL.
的重量被用于排序建议的结果; 首先返回具有较高权重的结果.我们将使用给定产品的销售数量作为其重量.
这是ProductIterator.java的内容:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import org.apache.lucene.search.suggest.InputIterator;
import org.apache.lucene.util.BytesRef;
class ProductIterator implements InputIterator
{
private Iterator<Product> productIterator;
private Product currentProduct;
ProductIterator(Iterator<Product> productIterator) {
this.productIterator = productIterator;
}
public boolean hasContexts() {
return true;
}
public boolean hasPayloads() {
return true;
}
public Comparator<BytesRef> getComparator() {
return null;
}
// This method needs to return the key for the record; this is the
// text we'll be autocompleting against.
public BytesRef next() {
if (productIterator.hasNext()) {
currentProduct = productIterator.next();
try {
return new BytesRef(currentProduct.name.getBytes("UTF8"));
} catch (UnsupportedEncodingException e) {
throw new Error("Couldn't convert to UTF-8");
}
} else {
return null;
}
}
// This method returns the payload for the record, which is
// additional data that can be associated with a record and
// returned when we do suggestion lookups. In this example the
// payload is a serialized Java object representing our product.
public BytesRef payload() {
try {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(currentProduct);
out.close();
return new BytesRef(bos.toByteArray());
} catch (IOException e) {
throw new Error("Well that's unfortunate.");
}
}
// This method returns the contexts for the record, which we can
// use to restrict suggestions. In this example we use the
// regions in which a product is sold.
public Set<BytesRef> contexts() {
try {
Set<BytesRef> regions = new HashSet();
for (String region : currentProduct.regions) {
regions.add(new BytesRef(region.getBytes("UTF8")));
}
return regions;
} catch (UnsupportedEncodingException e) {
throw new Error("Couldn't convert to UTF-8");
}
}
// This method helps us order our suggestions. In this example we
// use the number of products of this type that we've sold.
public long weight() {
return currentProduct.numberSold;
}
}
Run Code Online (Sandbox Code Playgroud)
在我们的驱动程序中,我们将执行以下操作:
StandardTokenizer.AnalyzingInfixSuggester使用RAM目录和tokenizer 创建一个.ProductIterator.这是驱动程序,SuggestProducts.java:
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester;
import org.apache.lucene.search.suggest.Lookup;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;
public class SuggestProducts
{
// Get suggestions given a prefix and a region.
private static void lookup(AnalyzingInfixSuggester suggester, String name,
String region) {
try {
List<Lookup.LookupResult> results;
HashSet<BytesRef> contexts = new HashSet<BytesRef>();
contexts.add(new BytesRef(region.getBytes("UTF8")));
// Do the actual lookup. We ask for the top 2 results.
results = suggester.lookup(name, contexts, 2, true, false);
System.out.println("-- \"" + name + "\" (" + region + "):");
for (Lookup.LookupResult result : results) {
System.out.println(result.key);
Product p = getProduct(result);
if (p != null) {
System.out.println(" image: " + p.image);
System.out.println(" # sold: " + p.numberSold);
}
}
} catch (IOException e) {
System.err.println("Error");
}
}
// Deserialize a Product from a LookupResult payload.
private static Product getProduct(Lookup.LookupResult result)
{
try {
BytesRef payload = result.payload;
if (payload != null) {
ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes);
ObjectInputStream in = new ObjectInputStream(bis);
Product p = (Product) in.readObject();
return p;
} else {
return null;
}
} catch (IOException|ClassNotFoundException e) {
throw new Error("Could not decode payload :(");
}
}
public static void main(String[] args) {
try {
RAMDirectory index_dir = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);
AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(
Version.LUCENE_48, index_dir, analyzer);
// Create our list of products.
ArrayList<Product> products = new ArrayList<Product>();
products.add(
new Product(
"Electric Guitar",
"http://images.example/electric-guitar.jpg",
new String[]{"US", "CA"},
100));
products.add(
new Product(
"Electric Train",
"http://images.example/train.jpg",
new String[]{"US", "CA"},
100));
products.add(
new Product(
"Acoustic Guitar",
"http://images.example/acoustic-guitar.jpg",
new String[]{"US", "ZA"},
80));
products.add(
new Product(
"Guarana Soda",
"http://images.example/soda.jpg",
new String[]{"ZA", "IE"},
130));
// Index the products with the suggester.
suggester.build(new ProductIterator(products.iterator()));
// Do some example lookups.
lookup(suggester, "Gu", "US");
lookup(suggester, "Gu", "ZA");
lookup(suggester, "Gui", "CA");
lookup(suggester, "Electric guit", "US");
} catch (IOException e) {
System.err.println("Error!");
}
}
}
Run Code Online (Sandbox Code Playgroud)
这是驱动程序的输出:
-- "Gu" (US):
Electric Guitar
image: http://images.example/electric-guitar.jpg
# sold: 100
Acoustic Guitar
image: http://images.example/acoustic-guitar.jpg
# sold: 80
-- "Gu" (ZA):
Guarana Soda
image: http://images.example/soda.jpg
# sold: 130
Acoustic Guitar
image: http://images.example/acoustic-guitar.jpg
# sold: 80
-- "Gui" (CA):
Electric Guitar
image: http://images.example/electric-guitar.jpg
# sold: 100
-- "Electric guit" (US):
Electric Guitar
image: http://images.example/electric-guitar.jpg
# sold: 100
Run Code Online (Sandbox Code Playgroud)
有一种方法可以避免写一个InputIterator你可能会觉得更容易的完整.你可以写一个存根InputIterator返回null其next,payload和contexts方法.通过它的一个实例AnalyzingInfixSuggester的build方法:
suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));
Run Code Online (Sandbox Code Playgroud)
然后,对于要索引的每个项目,请调用AnalyzingInfixSuggester add方法:
suggester.add(text, contexts, weight, payload)
Run Code Online (Sandbox Code Playgroud)
在为所有内容编制索引之后,请致电refresh:
suggester.refresh();
Run Code Online (Sandbox Code Playgroud)
如果您正在索引大量数据,则可以使用多个线程的此方法显着加快索引:调用build,然后使用多个线程到add项目,然后最终调用refresh.
[编辑2015-04-23以演示来自LookupResult有效载荷的反序列化信息.]