Elasticsearch批量索引JSON数据

Ami*_*t P 25 json elasticsearch

我正在尝试将JSON文件批量索引到新的Elasticsearch索引中,但我无法这样做.我在JSON中有以下示例数据

[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},
{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},
{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},
{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},
{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},
{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},
{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},
{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},
{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},
{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},
{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},
{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},
{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},
{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},
{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]
Run Code Online (Sandbox Code Playgroud)

当我尝试使用Elasticsearch的标准批量索引api时,我收到此错误错误:{"message":"ActionRequestValidationException [验证失败:1:未添加任何请求;]"}

任何人都可以帮助索引这种类型的JSON吗?

Val*_*Val 46

您需要做的是读取该JSON文件,然后使用_bulk端点期望的格式构建批量请求,即命令的一行和文档的一行,由换行符分隔...冲洗并重复每个文件:

curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'
Run Code Online (Sandbox Code Playgroud)

只要确保替换your_indexyour_type你正在使用的实际索引和类型名称.

UPDATE

请注意,可以通过删除命令行来缩短命令行_index,_type如果在URL中指定了这些命令行._id如果在映射中指定id字段路径,也可以删除(请注意,此功能在ES 2.0中将不推荐使用).至少,您的命令行可能看起来像{"index":{}}所有文档,但它始终是必需的,以指定您要执行的操作类型(在本例中index为文档)

更新2

curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary  @/home/data1.json
Run Code Online (Sandbox Code Playgroud)

/home/data1.json 应该是这样的:

{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}
Run Code Online (Sandbox Code Playgroud)

  • 对于每个文档,命令行始终是必需的。如果您在 URL 中添加索引和类型名称(即“localhost:9200/your_index/your_type/_bulk”),则可以从命令行中删除“_index”和“_type”以缩短它。还有一种方法可以不必指定 `_id`,但至少,您始终需要指定要对文档执行的操作,即您可以执行的最短操作是 `{"index":{} }` (2认同)
  • @Val Fair 建议 - 创建了新问题:/sf/ask/3192094111/ (2认同)

Tho*_*mas 8

截至目前,6.1.2是ElasticSearch的最新版本,在Windows(x64)上对我有效的curl命令是

curl -s -XPOST localhost:9200/my_index/my_index_type/_bulk -H "Content-Type: 
application/x-ndjson" --data-binary @D:\data\mydata.json
Run Code Online (Sandbox Code Playgroud)

mydata.json中应该存在的数据格式与@val的答案相同