MongoDB完整和部分文本搜索

Leo*_*nel 32 full-text-indexing mongodb mongodb-query aggregation-framework spring-data-mongodb

ENV:

  • 使用MongoS的MongoDB(3.2.0)

采集:

  • 用户

文本索引创建:

  BasicDBObject keys = new BasicDBObject();
  keys.put("name","text");

  BasicDBObject options = new BasicDBObject();
  options.put("name", "userTextSearch");
  options.put("unique", Boolean.FALSE);
  options.put("background", Boolean.TRUE);

  userCollection.createIndex(keys, options); // using MongoTemplate
Run Code Online (Sandbox Code Playgroud)

文献:

  • { "名": "莱昂内尔"}

查询:

  • db.users.find( { "$text" : { "$search" : "LEONEL" } } ) =>找到了
  • db.users.find( { "$text" : { "$search" : "leonel" } } ) => FOUND(搜索caseSensitive为false)
  • db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) => FOUND(使用diacriticSensitive搜索为false)
  • db.users.find( { "$text" : { "$search" : "LEONE" } } ) =>找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "LEO" } } ) =>未找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "L" } } ) =>未找到(部分搜索)

知道为什么我使用查询"LEO"或"L"获得0结果?

不允许使用带有文本索引搜索的正则表达式.

db.getCollection('users')
     .find( { "$text" : { "$search" : "/LEO/i", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
     .count() // 0 results

db.getCollection('users')
     .find( { "$text" : { "$search" : "LEO", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
.count() // 0 results
Run Code Online (Sandbox Code Playgroud)

Mongo文档:

Ste*_*nie 48

与MongoDB 3.4一样,文本搜索功能旨在支持对文本内容进行不区分大小写的搜索,并使用针对停用词和词干的特定于语言的规则.受支持语言的词干规则基于标准算法,该算法通常处理常见的动词和名词,但不知道专有名词.

对于部分或模糊匹配没有明确的支持,但是产生类似结果的术语似乎可以正常工作.例如:"味道","味道"和有品味的"所有词根"尝试".尝试Snowball Stemming Demo页面来试验更多单词和词干算法.

匹配的结果是同一个单词"LEONEL"的所有变体,并且仅根据大小写和变音符号而有所不同.除非"LEONEL"可以根据所选语言的规则缩短,否则这些是唯一可以匹配的变体类型.

如果你想进行有效的部分匹配,你需要采取不同的方法.有些有用的想法请参阅:

您可以在MongoDB问题跟踪器中查看/ upvote相关的改进请求:SERVER-15090:改进文本索引以支持部分字匹配.


nur*_*diq 19

无需创建索引,我们可以简单地使用:

db.users.find({ name: /<full_or_partial_text>/i}) (不区分大小写)

  • `new RegExp(string, 'i')` 适合任何需要动态字符串搜索的人 (8认同)
  • 请注意,这并不高效且可扩展,因为搜索不是针对索引字段,对于大型表,这会很慢。 (4认同)

Joh*_*rug 8

If you want to use all the benefits of MongoDB's full-text search AND want partial matches (maybe for auto-complete), the n-gram based approach mentioned by Shrikant Prabhu was the right solution for me. Obviously your mileage may vary, and this might not be practical when indexing huge documents.

In my case I mainly needed the partial matches to work for just the title field (and a few other short fields) of my documents.

I used an edge n-gram approach. What does that mean? In short, you turn a string like "Mississippi River" into a string like "Mis Miss Missi Missis Mississ Mississi Mississip Mississipp Mississippi Riv Rive River".

Inspired by this code by Liu Gen, I came up with this method:

function createEdgeNGrams(str) {
    if (str && str.length > 3) {
        const minGram = 3
        const maxGram = str.length
        
        return str.split(" ").reduce((ngrams, token) => {
            if (token.length > minGram) {   
                for (let i = minGram; i <= maxGram && i <= token.length; ++i) {
                    ngrams = [...ngrams, token.substr(0, i)]
                }
            } else {
                ngrams = [...ngrams, token]
            }
            return ngrams
        }, []).join(" ")
    } 
    
    return str
}

let res = createEdgeNGrams("Mississippi River")
console.log(res)
Run Code Online (Sandbox Code Playgroud)

Now to make use of this in Mongo, I add a searchTitle field to my documents and set its value by converting the actual title field into edge n-grams with the above function. I also create a "text" index for the searchTitle field.

I then exclude the searchTitle field from my search results by using a projection:

db.collection('my-collection')
  .find({ $text: { $search: mySearchTerm } }, { projection: { searchTitle: 0 } })
Run Code Online (Sandbox Code Playgroud)

  • 在我看来,这是迄今为止最好的解决方案,可惜 mongo 没有开箱即用的 ngram。 (3认同)

Ric*_*las 7

由于Mongo当前默认情况下不支持部分搜索...

我创建了一个简单的静态方法。

import mongoose from 'mongoose'

const PostSchema = new mongoose.Schema({
    title: { type: String, default: '', trim: true },
    body: { type: String, default: '', trim: true },
});

PostSchema.index({ title: "text", body: "text",},
    { weights: { title: 5, body: 3, } })

PostSchema.statics = {
    searchPartial: function(q, callback) {
        return this.find({
            $or: [
                { "title": new RegExp(q, "gi") },
                { "body": new RegExp(q, "gi") },
            ]
        }, callback);
    },

    searchFull: function (q, callback) {
        return this.find({
            $text: { $search: q, $caseSensitive: false }
        }, callback)
    },

    search: function(q, callback) {
        this.searchFull(q, (err, data) => {
            if (err) return callback(err, data);
            if (!err && data.length) return callback(err, data);
            if (!err && data.length === 0) return this.searchPartial(q, callback);
        });
    },
}

export default mongoose.models.Post || mongoose.model('Post', PostSchema)
Run Code Online (Sandbox Code Playgroud)

如何使用:

import Post from '../models/post'

Post.search('Firs', function(err, data) {
   console.log(data);
})
Run Code Online (Sandbox Code Playgroud)


fla*_*ash 6

在 npm 上的mongoose 插件中包装了 @Ricardo Canelas 的答案

进行了两项更改: - 使用承诺 - 搜索具有类型的任何字段 String

这是重要的源代码:

// mongoose-partial-full-search

module.exports = exports = function addPartialFullSearch(schema, options) {
  schema.statics = {
    ...schema.statics,
    makePartialSearchQueries: function (q) {
      if (!q) return {};
      const $or = Object.entries(this.schema.paths).reduce((queries, [path, val]) => {
        val.instance == "String" &&
          queries.push({
            [path]: new RegExp(q, "gi")
          });
        return queries;
      }, []);
      return { $or }
    },
    searchPartial: function (q, opts) {
      return this.find(this.makePartialSearchQueries(q), opts);
    },

    searchFull: function (q, opts) {
      return this.find({
        $text: {
          $search: q
        }
      }, opts);
    },

    search: function (q, opts) {
      return this.searchFull(q, opts).then(data => {
        return data.length ? data : this.searchPartial(q, opts);
      });
    }
  }
}

exports.version = require('../package').version;
Run Code Online (Sandbox Code Playgroud)

用法

// PostSchema.js
import addPartialFullSearch from 'mongoose-partial-full-search';
PostSchema.plugin(addPartialFullSearch);

// some other file.js
import Post from '../wherever/models/post'

Post.search('Firs').then(data => console.log(data);)
Run Code Online (Sandbox Code Playgroud)