弹性搜索的查询字符串中的符号

Arn*_*nob 10 elasticsearch tire ruby-on-rails-3.2

我有一个名为偏差属性的"文档"(activerecords).该属性的值为"Bin X""Bin $""Bin q""Bin%"等.

我正在尝试使用tire/elasticsearch来搜索属性.我正在使用空白分析器来索引偏差属性.这是我创建索引的代码:

settings :analysis => {
    :filter  => {
      :ngram_filter => {
        :type => "nGram",
        :min_gram => 2,
        :max_gram => 255
      },
      :deviation_filter => {
        :type => "word_delimiter",
        :type_table => ['$ => ALPHA']
      }
    },
    :analyzer => {
      :ngram_analyzer => {
        :type  => "custom",
        :tokenizer  => "standard",
        :filter  => ["lowercase", "ngram_filter"]
      },
      :deviation_analyzer => {
        :type => "custom",
        :tokenizer => "whitespace",
        :filter => ["lowercase"]
      }
    }
  } do
    mapping do
      indexes :id, :type => 'integer'
      [:equipment, :step, :recipe, :details, :description].each do |attribute|
        indexes attribute, :type => 'string', :analyzer => 'ngram_analyzer'
      end
      indexes :deviation, :analyzer => 'whitespace'
    end
  end
Run Code Online (Sandbox Code Playgroud)

当查询字符串不包含特殊字符时,搜索似乎工作正常.例如,Bin X将只返回那些在其中包含单词BinAND的记录X.但是,搜索类似的内容Bin $Bin %显示所有具有Bin几乎忽略该符号的结果的结果(带有符号的结果在搜索结果中显示得更高而没有结果).

这是我创建的搜索方法

def self.search(params)
    tire.search(load: true) do
      query { string "#{params[:term].downcase}:#{params[:query]}", default_operator: "AND" }
        size 1000
    end
end
Run Code Online (Sandbox Code Playgroud)

以下是我如何构建搜索表单:

<div>
    <%= form_tag issues_path, :class=> "formtastic issue", method: :get do %>
        <fieldset class="inputs">
        <ol>
            <li class="string input medium search query optional stringish inline">
                <% opts = ["Description", "Detail","Deviation","Equipment","Recipe", "Step"] %>
                <%= select_tag :term, options_for_select(opts, params[:term]) %>
                <%= text_field_tag :query, params[:query] %>
                <%= submit_tag "Search", name: nil, class: "btn" %>
            </li>
        </ol>
        </fieldset>
    <% end %>
</div>
Run Code Online (Sandbox Code Playgroud)

Rob*_*jic 27

您可以清理查询字符串.这是一个消毒剂,适用于我尝试过的所有东西:

def sanitize_string_for_elasticsearch_string_query(str)
  # Escape special characters
  # http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html#Escaping Special Characters
  escaped_characters = Regexp.escape('\\/+-&|!(){}[]^~*?:')
  str = str.gsub(/([#{escaped_characters}])/, '\\\\\1')

  # AND, OR and NOT are used by lucene as logical operators. We need
  # to escape them
  ['AND', 'OR', 'NOT'].each do |word|
    escaped_word = word.split('').map {|char| "\\#{char}" }.join('')
    str = str.gsub(/\s*\b(#{word.upcase})\b\s*/, " #{escaped_word} ")
  end

  # Escape odd quotes
  quote_count = str.count '"'
  str = str.gsub(/(.*)"(.*)/, '\1\"\3') if quote_count % 2 == 1

  str
end

params[:query] = sanitize_string_for_elasticsearch_string_query(params[:query])
Run Code Online (Sandbox Code Playgroud)

  • 我还需要在`escaped_characters`数组中添加正斜杠.`escaped_characters = Regexp.escape('\\ + - &|!(){} [] ^〜*?:\ /')`因为正在用正斜杠打破字符串. (2认同)
  • 只是想注意:正斜杠现在是一个特殊字符,应该被转义.http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters (2认同)