如何使用Lucene的新分析InfixSuggester API实现自动建议?

use*_*977 10 java lucene autocomplete search-suggestion

我是Lucene的绿手,我想实现自动建议,就像google一样,当我输入像'G'这样的字符时,它会给我一个列表,你可以尝试自己.

我在整个网上搜索过.没有人这样做,它给了我们一些新的工具包建议

但我需要一个例子告诉我该怎么做

有人可以帮忙吗?

Joh*_*man 44

我将为您提供一个非常完整的示例,向您展示如何使用AnalyzingInfixSuggester.在这个例子中,我们假装我们是亚马逊,我们想要自动完成产品搜索字段.我们将利用Lucene建议系统的功能来实现以下功能:

  1. 排名结果:我们将首先推荐最受欢迎的配套产品.
  2. 受区域限制的结果:我们仅建议在客户所在国家/地区销售的产品.
  3. 产品照片:我们将产品照片URL存储在建议索引中,以便我们可以在搜索结果中显示它们,而无需进行额外的数据库查找.

首先,我将定义一个简单的类来保存Product.java中有关产品的信息:

import java.util.Set;

class Product implements java.io.Serializable
{
    String name;
    String image;
    String[] regions;
    int numberSold;

    public Product(String name, String image, String[] regions,
                   int numberSold) {
        this.name = name;
        this.image = image;
        this.regions = regions;
        this.numberSold = numberSold;
    }
}
Run Code Online (Sandbox Code Playgroud)

要使用AnalyzingInfixSuggester's build方法索引记录,您需要将实现该org.apache.lucene.search.suggest.InputIterator接口的对象传递给它.可以InputIterator访问每条记录的密钥,上下文,有效负载权重.

关键是你真正想要搜索和禁止自动完成的文本.在我们的示例中,它将是产品的名称.

上下文是一组的,你可以用它来筛选记录对额外的,任意的数据.在我们的示例中,上下文是我们将特定产品发送到的国家/地区的ISO代码集.

有效载荷是要在备案索引存储更多任意数据.在这个例子中,我们实际上将序列化每个Product实例并将结果字节存储为有效负载.然后,当我们稍后进行查找时,我们可以反序列化有效负载并访问产品实例中的信息,如图像URL.

重量被用于排序建议的结果; 首先返回具有较高权重的结果.我们将使用给定产品的销售数量作为其重量.

这是ProductIterator.java的内容:

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import org.apache.lucene.search.suggest.InputIterator;
import org.apache.lucene.util.BytesRef;


class ProductIterator implements InputIterator
{
    private Iterator<Product> productIterator;
    private Product currentProduct;

    ProductIterator(Iterator<Product> productIterator) {
        this.productIterator = productIterator;
    }

    public boolean hasContexts() {
        return true;
    }

    public boolean hasPayloads() {
        return true;
    }

    public Comparator<BytesRef> getComparator() {
        return null;
    }

    // This method needs to return the key for the record; this is the
    // text we'll be autocompleting against.
    public BytesRef next() {
        if (productIterator.hasNext()) {
            currentProduct = productIterator.next();
            try {
                return new BytesRef(currentProduct.name.getBytes("UTF8"));
            } catch (UnsupportedEncodingException e) {
                throw new Error("Couldn't convert to UTF-8");
            }
        } else {
            return null;
        }
    }

    // This method returns the payload for the record, which is
    // additional data that can be associated with a record and
    // returned when we do suggestion lookups.  In this example the
    // payload is a serialized Java object representing our product.
    public BytesRef payload() {
        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream out = new ObjectOutputStream(bos);
            out.writeObject(currentProduct);
            out.close();
            return new BytesRef(bos.toByteArray());
        } catch (IOException e) {
            throw new Error("Well that's unfortunate.");
        }
    }

    // This method returns the contexts for the record, which we can
    // use to restrict suggestions.  In this example we use the
    // regions in which a product is sold.
    public Set<BytesRef> contexts() {
        try {
            Set<BytesRef> regions = new HashSet();
            for (String region : currentProduct.regions) {
                regions.add(new BytesRef(region.getBytes("UTF8")));
            }
            return regions;
        } catch (UnsupportedEncodingException e) {
            throw new Error("Couldn't convert to UTF-8");
        }
    }

    // This method helps us order our suggestions.  In this example we
    // use the number of products of this type that we've sold.
    public long weight() {
        return currentProduct.numberSold;
    }
}
Run Code Online (Sandbox Code Playgroud)

在我们的驱动程序中,我们将执行以下操作:

  1. 在RAM中创建索引目录.
  2. 创建一个StandardTokenizer.
  3. AnalyzingInfixSuggester使用RAM目录和tokenizer 创建一个.
  4. 索引一些产品使用ProductIterator.
  5. 打印一些示例查找的结果.

这是驱动程序,SuggestProducts.java:

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester;
import org.apache.lucene.search.suggest.Lookup;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;

public class SuggestProducts
{
    // Get suggestions given a prefix and a region.
    private static void lookup(AnalyzingInfixSuggester suggester, String name,
                               String region) {
        try {
            List<Lookup.LookupResult> results;
            HashSet<BytesRef> contexts = new HashSet<BytesRef>();
            contexts.add(new BytesRef(region.getBytes("UTF8")));
            // Do the actual lookup.  We ask for the top 2 results.
            results = suggester.lookup(name, contexts, 2, true, false);
            System.out.println("-- \"" + name + "\" (" + region + "):");
            for (Lookup.LookupResult result : results) {
                System.out.println(result.key);
                Product p = getProduct(result);
                if (p != null) {
                    System.out.println("  image: " + p.image);
                    System.out.println("  # sold: " + p.numberSold);
                }
            }
        } catch (IOException e) {
            System.err.println("Error");
        }
    }

    // Deserialize a Product from a LookupResult payload.
    private static Product getProduct(Lookup.LookupResult result)
    {
        try {
            BytesRef payload = result.payload;
            if (payload != null) {
                ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes);
                ObjectInputStream in = new ObjectInputStream(bis);
                Product p = (Product) in.readObject();
                return p;
            } else {
                return null;
            }
        } catch (IOException|ClassNotFoundException e) {
            throw new Error("Could not decode payload :(");
        }
    }

    public static void main(String[] args) {
        try {
            RAMDirectory index_dir = new RAMDirectory();
            StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);
            AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(
                Version.LUCENE_48, index_dir, analyzer);

            // Create our list of products.
            ArrayList<Product> products = new ArrayList<Product>();
            products.add(
                new Product(
                    "Electric Guitar",
                    "http://images.example/electric-guitar.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Electric Train",
                    "http://images.example/train.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Acoustic Guitar",
                    "http://images.example/acoustic-guitar.jpg",
                    new String[]{"US", "ZA"},
                    80));
            products.add(
                new Product(
                    "Guarana Soda",
                    "http://images.example/soda.jpg",
                    new String[]{"ZA", "IE"},
                    130));

            // Index the products with the suggester.
            suggester.build(new ProductIterator(products.iterator()));

            // Do some example lookups.
            lookup(suggester, "Gu", "US");
            lookup(suggester, "Gu", "ZA");
            lookup(suggester, "Gui", "CA");
            lookup(suggester, "Electric guit", "US");
        } catch (IOException e) {
            System.err.println("Error!");
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

这是驱动程序的输出:

-- "Gu" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gu" (ZA):
Guarana Soda
  image: http://images.example/soda.jpg
  # sold: 130
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gui" (CA):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
-- "Electric guit" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
Run Code Online (Sandbox Code Playgroud)

附录

有一种方法可以避免写一个InputIterator你可能会觉得更容易的完整.你可以写一个存根InputIterator返回nullnext,payloadcontexts方法.通过它的一个实例AnalyzingInfixSuggesterbuild方法:

suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));
Run Code Online (Sandbox Code Playgroud)

然后,对于要索引的每个项目,请调用AnalyzingInfixSuggester add方法:

suggester.add(text, contexts, weight, payload)
Run Code Online (Sandbox Code Playgroud)

在为所有内容编制索引之后,请致电refresh:

suggester.refresh();
Run Code Online (Sandbox Code Playgroud)

如果您正在索引大量数据,则可以使用多个线程的此方法显着加快索引:调用build,然后使用多个线程到add项目,然后最终调用refresh.

[编辑2015-04-23以演示来自LookupResult有效载荷的反序列化信息.]