如何使用python更新弹性搜索文档?

7 python elasticsearch

我有下面的代码将数据添加到弹性搜索中

from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
 {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
 {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
es.indices.create(index='my-index_1', ignore=400)

for e in enumerate(r):
    #es.indices.update(index="my-index_1", body=e[1])
    es.index(index="my-index_1", body=e[1])

#Retrieve the data
es.search(index = 'my-index_1')['hits']['hits']
Run Code Online (Sandbox Code Playgroud)

要求 如何更新文档

r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
Run Code Online (Sandbox Code Playgroud)

这里Dr. Messi, Dr. Christiano必须更新索引,Dr. Bernard M. Aaron不应该更新,因为它已经存在于索引中

big*_*nty 5

在 Elasticsearch 中,当在未提供自定义 ID 的情况下对数据进行索引时,Elasticsearch 将为您索引的每个文档创建一个新 ID。

因此,由于您没有提供 ID,Elasticsearch 会自动生成它。

但您还想检查是否Name已经存在。有两种方法:

  1. 索引数据而不需要_id为每个文档传递一个。此后,您必须使用该Name字段进行搜索以查看该文档是否存在。
  2. 使用您自己的数据_id为每个文档建立索引。然后用 进行搜索_id

我将演示创建我们自己的 ID 的第二种方法。由于您正在现场搜索Name,我将使用 MD5 对其进行哈希处理以生成_id. (任何哈希函数都可以工作。)

第一个索引数据:

import hashlib
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
records = [
    {'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}
]

index_name="my-index_1"
es.indices.create(index=index_name, ignore=400)

for record in records:
    #es.indices.update(index="my-index_1", body=record)
    es.index(index=index_name, body=record,id=hashlib.md5(record['Name'].encode()).hexdigest())
Run Code Online (Sandbox Code Playgroud)

输出:

[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}}]
Run Code Online (Sandbox Code Playgroud)

下一步:对新数据建立索引

[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}}]
Run Code Online (Sandbox Code Playgroud)

输出:

[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': 'e2e0f463145568471097ff027b18b40d',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '23bb4f1a3a41efe7f4cab8a80d766708',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}}]
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,Dr. Bernard M. Aaron记录没有被索引,因为它已经存在