Elasticsearch:突出显示来自附件内的命中

Mel*_*emi 7 syntax-highlighting ruby-on-rails elasticsearch tire

我无法在Rails应用程序中使用Elasticsearch(和Tire)进行突出显示.我可以成功索引PDF附件并查询它们但我无法突出显示工作.

不熟悉ES所以不确定在哪里进行故障排除.将从映射和卷曲查询开始,但随时可以要求更多信息.

class Article < ActiveRecord::Base
  include Tire::Model::Search
  include Tire::Model::Callbacks

  attr_accessible :title, :content, :published_on, :filename 

  mapping do
    indexes :id, :type =>'integer'
    indexes :title
    indexes :content
    indexes :published_on, :type => 'date'
    indexes :attachment, :type => 'attachment',
                            :fields => {
                            :name       => { :store => 'yes' },
                            :content    => { :store => 'yes' },
                            :title      => { :store => 'yes' },
                            :file       => { :term_vector => 'with_positions_offsets', :store => 'yes' },
                            :date       => { :store => 'yes' }
                          }
  end

  def to_indexed_json
    to_json(:methods => [:attachment])
  end

  def attachment
    if filename.present?
      path_to_pdf = "/Volumes/Calvin/sample_pdfs/#{filename}.pdf"
      Base64.encode64(open(path_to_pdf) { |pdf| pdf.read })
    else
      Base64.encode64("missing")
    end
  end
end
Run Code Online (Sandbox Code Playgroud)

映射(通过卷曲):

$ curl -XGET 'http://localhost:9200/_mapping?pretty=true'
{
  "articles" : {
    "article" : {
      "properties" : {
        "attachment" : {
          "type" : "attachment",
          "path" : "full",
          "fields" : {
            "attachment" : {
              "type" : "string"
            },
            "title" : {
              "type" : "string",
              "store" : "yes"
            },
            "name" : {
              "type" : "string",
              "store" : "yes"
            },
            "date" : {
              "type" : "date",
              "ignore_malformed" : false,
              "store" : "yes",
              "format" : "dateOptionalTime"
            },
            "content_type" : {
              "type" : "string"
            }
          }
        },
        "content" : {
          "type" : "string"
        },
        "created_at" : {
          "type" : "date",
          "ignore_malformed" : false,
          "format" : "dateOptionalTime"
        },
        "filename" : {
          "type" : "string"
        },
        "id" : {
          "type" : "integer",
          "ignore_malformed" : false
        },
        "published_on" : {
          "type" : "date",
          "ignore_malformed" : false,
          "format" : "dateOptionalTime"
        },
        "title" : {
          "type" : "string"
        },
        "updated_at" : {
          "type" : "date",
          "ignore_malformed" : false,
          "format" : "dateOptionalTime"
        }
      }
    }
  }
}%
Run Code Online (Sandbox Code Playgroud)

125页索引PDF中带有"匹配"的查询:

$ curl "localhost:9200/_search?pretty=true" -d '{
quote>   "fields" : ["title"],
quote>   "query" : {
quote>     "query_string" : {
quote>       "query" : "xerox"
quote>     }
quote>   },
quote>   "highlight" : {
quote>     "fields" : {
quote>       "attachment" : {}
quote>     }
quote>   }
quote> }'

{
  "took" : 1077,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.036417194,
    "hits" : [ {
      "_index" : "articles",
      "_type" : "article",
      "_id" : "13",
      "_score" : 0.036417194,
      "fields" : {
        "title" : "F-E12"
      }
    } ]
  }
}%    
Run Code Online (Sandbox Code Playgroud)

我期待一个像这样的部分:

"highlight" : {
        "attachment" : [ "\nLast Year <em>Xerox</em> moved their facilities" ]
  }
Run Code Online (Sandbox Code Playgroud)

谢谢你的帮助!

Edit2:调整后的查询(更改attachmentattachment.file)无济于事:

$ curl "localhost:9200/_search?pretty=true" -d '{
  "fields" : ["title","attachment"],
  "query" : {"query_string" : {"query" : "xerox"}},
  "highlight" : {"fields" : {"attachment.file" : {}}}
}'

{
  "took" : 221,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.036417194,
    "hits" : [ {
      "_index" : "articles",
      "_type" : "article",
      "_id" : "13",
      "_score" : 0.036417194,
      "fields" : {
        "title" : "F-E12",
        "attachment" : "JVBERi0xLjYNJeLjz9MNCjk4NSAwIG9iag08PC9MaW5lYXJpemVkIDEvTCA...\n"
      }
    } ]
  }
}
Run Code Online (Sandbox Code Playgroud)

Edit3(删除"字段"):

$ curl "localhost:9200/_search?pretty=true" -d '{
>   "query" : {"query_string" : {"query" : "xerox"}},
>   "highlight" : {"fields" : {"attachment" : {}}}
> }'

{
  "took" : 1078,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.036417194,
    "hits" : [ {
      "_index" : "articles",
      "_type" : "article",
      "_id" : "13",
      "_score" : 0.036417194, "_source" : {"content":"Real report","created_at":"2012-08-28T22:44:08Z","filename":"F-E12","id":13,"published_on":"2007-12-28","title":"F-E12","updated_at":"2012-08-28T22:44:08Z","attachment":"JVBERi0xLjYNJeLjz9MNCjk4NSAwIG9iag08PC9MaW5lYXJpemVkID...\n"
      }
    } ]
  }
}
Run Code Online (Sandbox Code Playgroud)

Edit4(从动作中的附件类型映射教程):

$ curl -XGET 'http://localhost:9200/test/_mapping?pretty=true'
{
  "test" : {
    "attachment" : {
      "properties" : {
        "file" : {
          "type" : "attachment",
          "path" : "full",
          "fields" : {
            "file" : {                #<== This appears to be missing 
              "type" : "string",      #<== from my Articles mapping
              "store" : "yes",        #<==
              "term_vector" : "with_positions_offsets"  #<==
            },
            "author" : {
              "type" : "string"
            },
            "title" : {
              "type" : "string",
              "store" : "yes"
            },
            "name" : {
              "type" : "string"
            },
            "date" : {
              "type" : "date",
              "ignore_malformed" : false,
              "format" : "dateOptionalTime"
            },
            "keywords" : {
              "type" : "string"
            },
            "content_type" : {
              "type" : "string"
            }
          }
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

Mel*_*emi 8

我想到了!最后...

问题在于我在Article类中的映射语法.需要将":file"重命名为":attachment".

  tire.mapping do
    indexes :id, :type =>'integer'
    indexes :title
    indexes :content
    indexes :published_on, :type => 'date'
    indexes :attachment, :type => 'attachment', #:null_value => 'missing_file',
                            :fields => {
                            :name       => { :store => 'yes' },  # exists?!?
                            :content    => { :store => 'yes' },
                            :title      => { :store => 'yes' },
  # WRONG! see next line => :file       => { :term_vector => 'with_positions_offsets', :store => 'yes' },
                            :attachment => { :term_vector => 'with_positions_offsets', :store => 'yes' },
                            :date       => { :store => 'yes' }
                          }
Run Code Online (Sandbox Code Playgroud)