RavenDb期望性能查询数百万的文档

Ste*_*ten 4 ravendb

我能够使用嵌入版本的RavenDb加载几百万个文档,非常漂亮!

现在我正在尝试查询这些项目,并且发现性能不是我预期的,尽可能接近瞬间,而是在相当强劲的机器上超过18秒.

下面,你会找到我的天真代码.

注意:我现在已经解决了这个问题,最后的代码位于帖子的底部.需要注意的是,你需要索引,它们必须是正确的类型,并且需要让RavenDB知道它们.非常满意通过查询引擎返回记录的性能和质量.

谢谢你,斯蒂芬

using (var store = new EmbeddableDocumentStore { DataDirectory = @"C:\temp\ravendata" }.Initialize())
{
    using (IDocumentSession session = store.OpenSession())
    {
        var q = session.Query<Product>().Where(x => x.INFO2.StartsWith("SYS")).ToList();
    }
}


[Serializable]
public class Product
{
    public decimal ProductId { get; set; }
    ....
    public string INFO2 { get; set; }
}
Run Code Online (Sandbox Code Playgroud)

编辑

我加了这个班

public class InfoIndex_Search : AbstractIndexCreationTask<Product>
{
    public InfoIndex_Search()
    {
        Map = products => 
            from p in products
                          select new { Info2Index = p.INFO2 };

        Index(x => x.INFO2, FieldIndexing.Analyzed);
    }
}
Run Code Online (Sandbox Code Playgroud)

并将调用方法更改为如下所示.

        using (var store = new EmbeddableDocumentStore { DataDirectory = @"C:\temp\ravendata" }.Initialize())
        {
            // Tell Raven to create our indexes.
            IndexCreation.CreateIndexes(Assembly.GetExecutingAssembly(), store);

            List<Product> q = null;
            using (IDocumentSession session = store.OpenSession())
            {
                q = session.Query<Product>().Where(x => x.INFO2.StartsWith("SYS")).ToList();
                watch.Stop();
            }
        }
Run Code Online (Sandbox Code Playgroud)

但我仍在报告18秒进行搜索.我错过了什么?另外需要注意的是,C:\ temp\ravendata\Indexes\InfoIndex%2fSearch文件夹中有相当多的新文件,虽然不像我插入数据时那么多,但是在运行此代码后它们似乎已经消失了几次尝试查询.应该是IndexCreation.CreateIndexes(Assembly.GetExecutingAssembly(),store); 在插入之前调用,然后呢?

EDIT1

使用这段代码我几乎可以在一个实例中得到查询,但似乎你只能运行一次,所以这就引出了问题.这运行在哪里以及正确的初始化程序是什么?

store.DatabaseCommands.PutIndex("ProdcustByInfo2", new IndexDefinitionBuilder<Product>
{
    Map = products => from product in products
                      select new { product.INFO2 },
    Indexes =
            {
                { x => x.INFO2, FieldIndexing.Analyzed}
            }
});
Run Code Online (Sandbox Code Playgroud)

EDIT2:工作实例

static void Main()
{
    Stopwatch watch = Stopwatch.StartNew();

    int q = 0;
    using (var store = new EmbeddableDocumentStore { DataDirectory = @"C:\temp\ravendata" }.Initialize())
    {
        if (store.DatabaseCommands.GetIndex("ProdcustByInfo2") == null)
        {
            store.DatabaseCommands.PutIndex("ProdcustByInfo2", new IndexDefinitionBuilder<Product>
            {
                Map = products => from product in products
                                  select new { product.INFO2 },
                Indexes = { { x => x.INFO2, FieldIndexing.Analyzed } }
            });
        }
        watch.Stop();
        Console.WriteLine("Time elapsed to create index {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);

        watch = Stopwatch.StartNew();               
        using (IDocumentSession session = store.OpenSession())
        {
            q = session.Query<Product>().Count();
        }
        watch.Stop();
        Console.WriteLine("Time elapsed to query for products values {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
        Console.WriteLine("Total number of products loaded: {0}{1}", q, System.Environment.NewLine);

        if (q == 0)
        {
            watch = Stopwatch.StartNew();
            var productsList = Parsers.GetProducts().ToList();
            watch.Stop();
            Console.WriteLine("Time elapsed to parse: {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
            Console.WriteLine("Total number of items parsed: {0}{1}", productsList.Count, System.Environment.NewLine);

            watch = Stopwatch.StartNew();
            productsList.RemoveAll(_ => _ == null);
            watch.Stop();
            Console.WriteLine("Time elapsed to remove null values {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
            Console.WriteLine("Total number of items loaded: {0}{1}", productsList.Count, System.Environment.NewLine);

            watch = Stopwatch.StartNew();
            int batch = 0;
            var session = store.OpenSession();
            foreach (var product in productsList)
            {
                batch++;
                session.Store(product);
                if (batch % 128 == 0)
                {
                    session.SaveChanges();
                    session.Dispose();
                    session = store.OpenSession();
                }
            }
            session.SaveChanges();
            session.Dispose();
            watch.Stop();
            Console.WriteLine("Time elapsed to populate db from collection {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
        }

        watch = Stopwatch.StartNew();
        using (IDocumentSession session = store.OpenSession())
        {
            q = session.Query<Product>().Where(x => x.INFO2.StartsWith("SYS")).Count();
        }
        watch.Stop();
        Console.WriteLine("Time elapsed to query for term {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
        Console.WriteLine("Total number of items found: {0}{1}", q, System.Environment.NewLine);
    }
    Console.ReadLine();
}
Run Code Online (Sandbox Code Playgroud)

Bob*_*orn 6

首先,你有一个覆盖INFO2的索引吗?

其次,请参阅Daniel Lang的"在RavenDB中搜索字符串属性"博客文章:

http://daniellang.net/searching-on-string-properties-in-ravendb/

如果有帮助,这就是我创建索引的方式:

public class LogMessageCreatedTime : AbstractIndexCreationTask<LogMessage>
{
    public LogMessageCreatedTime()
    {
        Map = messages => from message in messages
                          select new { MessageCreatedTime = message.MessageCreatedTime };
    }
}
Run Code Online (Sandbox Code Playgroud)

以及我如何在运行时添加它:

private static DocumentStore GetDatabase()
{            
    DocumentStore documentStore = new DocumentStore();            

    try
    {
        documentStore.ConnectionStringName = "RavenDb";                
        documentStore.Initialize();

        // Tell Raven to create our indexes.
        IndexCreation.CreateIndexes(typeof(DataAccessFactory).Assembly, documentStore);
    }
    catch
    {
        documentStore.Dispose();
        throw;
    }

    return documentStore;
}
Run Code Online (Sandbox Code Playgroud)

就我而言,我没有必要明确地查询索引; 它只是在我正常查询时使用的.