And*_*ndy 11 lucene performance
在分布式环境中搜索master-shard实现时,我将面临长搜索时间(10秒的顺序).但是,通过Luke的相同查询以毫秒为单位返回.
该应用程序是一个分布式系统.所有节点共享索引所在的公共NFS装载.为简单起见,我们考虑两个节点Node1和Node2.该/etc/fstab项目如下.
nfs:/vol/indexes /opt/indexes nfs rw,suid,nodev,rsize=32768,wsize=32768,soft,intr,tcp 0 0
Run Code Online (Sandbox Code Playgroud)
有多个Feed(比如说)Feed1和Feed2命中系统,每个节点的每个Feed都有一个分片,每个Feed都有一个分片.索引看起来像
Feed1-master
Feed1-shard-Node1.com
Feed1-shard-Node1.com0
Feed1-shard-Node1.com1
Run Code Online (Sandbox Code Playgroud)
执行搜索的代码是
FeedIndexManager fim = getManager(feedCode);
searcher = fim.getSearcher();
TopDocs docs = searcher.search(q, filter, start + max, sort);
private FeedIndexManager getManager(String feedCode) throws IOException {
if (!_managers.containsKey(feedCode)) {
synchronized(_managers) {
if (!_managers.containsKey(feedCode)) {
File shard = getShardIndexFile(feedCode);
File master = getMasterIndexFile(feedCode);
_managers.put(feedCode, new FeedIndexManager(shard, master));
}
}
}
return _managers.get(feedCode);
}
Run Code Online (Sandbox Code Playgroud)
FeedIndexManager如下.
public class FeedIndexManager implements Closeable {
private static final Analyzer WRITE_ANALYZER = makeWriterAnalyzer();
private final Directory _master;
private SearcherManager _searcherManager;
private final IndexPair _pair;
private int _numFailedMerges = 0;
private DateTime _lastMergeTime = new DateTime();
public FeedIndexManager(File shard, File master) throws IOException {
_master = NIOFSDirectory.open(master, new SimpleFSLockFactory(master));
IndexWriter writer = null;
try {
writer = new IndexWriter(_master,
WRITE_ANALYZER,
MaxFieldLength.LIMITED);
} finally {
if (null != writer) {
writer.close();
}
writer = null;
}
_searcherManager = new SearcherManager(_master);
_pair = new IndexPair(_master,
shard,
new IndexWriterBuilder(WRITE_ANALYZER));
}
public IndexPair getIndexWriter() {
return _pair;
}
public IndexSearcher getSearcher() {
try {
return _searcherManager.get();
}
catch (IOException ioe) {
throw new DatastoreRuntimeException(
"When trying to get an IndexSearcher for " + _master, ioe);
}
}
public void releaseSearcher(IndexSearcher searcher) {
try {
_searcherManager.release(searcher);
}
catch (IOException ioe) {
throw new DatastoreRuntimeException(
"When trying to release the IndexSearcher " + searcher
+ " for " + _master, ioe);
}
}
/**
* Merges the changes from the shard into the master.
*/
public boolean tryFlush() throws IOException {
LOG.debug("Trying to flush index manager at " + _master
+ " after " + _numFailedMerges + " failed merges.");
if (_pair.tryFlush()) {
LOG.debug("I succesfully flushed " + _master);
_numFailedMerges = 0;
_lastMergeTime = new DateTime();
return true;
}
LOG.warn("I couldn't flush " + _master + " after " + _numFailedMerges
+ " failed merges.");
_numFailedMerges++;
return false;
}
public long getMillisSinceMerge() {
return new DateTime().getMillis() - _lastMergeTime.getMillis();
}
public long getNumFailedMerges() {
return _numFailedMerges;
}
public void close() throws IOException {
_pair.close();
}
/**
* Return the Analyzer used for writing to indexes.
*/
private static Analyzer makeWriterAnalyzer() {
PerFieldAnalyzerWrapper analyzer =
new PerFieldAnalyzerWrapper(new LowerCaseAnalyzer());
analyzer.addAnalyzer(SingleFieldTag.ID.toString(), new KeywordAnalyzer());
// we want tokenizing on the CITY_STATE field
analyzer.addAnalyzer(AddressFieldTag.CITY_STATE.toString(),
new StandardAnalyzer(Version.LUCENE_CURRENT));
return analyzer;
}
}
Run Code Online (Sandbox Code Playgroud)
消耗大约95-98%延迟的杀手是这个调用,搜索需要大约20秒,而如果索引是通过Luke打开的,则是以毫秒为单位.
TopDocs docs = searcher.search(q, filter, start + max, sort);
Run Code Online (Sandbox Code Playgroud)
我有以下问题
每个饲料有多个母版是否合理,还是应该将它减少到只有一个母版?索引中的元素数量约为5000万.
在实体数量少于一百万(次秒响应)的馈送上,延迟较低.实体超过200万的馈送大约需要20秒.我应该每个节点只保留1个Shard,每个节点每个节点保留1个Shard吗?
每隔15秒就会尝试从碎片到主控的合并.这个参数应该调整吗?
我目前正在使用Lucene 3.1.0和JDK 1.6.这些盒子是两个64位内核,内存为8 GB.目前,JVM最大运行速度为4 GB.
任何提高性能的建议都受到高度赞赏.我已经执行了Lucene通常规定的所有标准性能调整.非常感谢阅读这篇冗长的帖子.
也许这不是您正在寻找的答案,但请看一下Elastic Search。它是围绕 Lucene 的分布式集群服务层,可以通过 HTTP 进行查询,也可以嵌入运行。
而且速度非常快,快得快得离谱。它似乎已经在幕后正确调整了 Lucene,同时仍然公开了完整的 Lucene 配置选项(如果您需要使用它们)。
让 Lucene 在分布式环境中执行是很困难的,正如您所发现的,您最终会遇到令人讨厌的锁定问题。ElasticSearch 旨在解决该特定问题,因此您可以解决其他问题。
| 归档时间: |
|
| 查看次数: |
1727 次 |
| 最近记录: |