我必须在ElasticSearch中存储一些与我的python程序集成的消息.现在我尝试存储消息的是:
d={"message":"this is message"}
for index_nr in range(1,5):
ElasticSearchAPI.addToIndex(index_nr, d)
print d
Run Code Online (Sandbox Code Playgroud)
这意味着如果我有10条消息,那么我必须重复我的代码10次.所以我想做的是尝试制作脚本文件或批处理文件.我已经检查了ElasticSearch指南,可以使用BULK API.格式应如下所示:
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", …Run Code Online (Sandbox Code Playgroud) 我有一个JSON文件,我需要在ElasticSearch服务器上对其进行索引.
JSOIN文件如下所示:
{
"sku": "1",
"vbid": "1",
"created": "Sun, 05 Oct 2014 03:35:58 +0000",
"updated": "Sun, 06 Mar 2016 12:44:48 +0000",
"type": "Single",
"downloadable-duration": "perpetual",
"online-duration": "365 days",
"book-format": "ePub",
"build-status": "In Inventory",
"description": "On 7 August 1914, a week before the Battle of Tannenburg and two weeks before the Battle of the Marne, the French army attacked the Germans at Mulhouse in Alsace. Their objective was to recapture territory which had been lost after the Franco-Prussian War of 1870-71, …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用NEST替换ES上的文档.我看到以下选项可用.
选项1:
var documents = new List<dynamic>();
`var blkOperations = documents.Select(doc => new BulkIndexOperation<T>`(doc)).Cast<IBulkOperation>().ToList();
var blkRequest = new BulkRequest()
{
Refresh = true,
Index = indexName,
Type = typeName,
Consistency = Consistency.One,
Operations = blkOperations
};
var response1 = _client.Raw.BulkAsync<T>(blkRequest);
Run Code Online (Sandbox Code Playgroud)
选项#2:
var descriptor = new BulkDescriptor();
foreach (var eachDoc in document)
{
var doc = eachDoc;
descriptor.Index<T>(i => i
.Index(indexName)
.Type(typeName)
.Document(doc));
}
var response = await _client.Raw.BulkAsync<T>(descriptor);
Run Code Online (Sandbox Code Playgroud)
那么有人可以告诉我哪个更好或者使用NEST进行批量更新或删除的任何其他选项?
我相信应该有一个公式来计算ElasticSearch中的批量索引大小.可能以下是这种公式的变量.
我想知道如果有人知道或使用数学公式.如果没有,人们如何决定他们的体积?通过反复试验?
我试图重新索引我的松紧搜索设置,目前正在研究弹性的文献检索和使用Python API的例子
关于这一切如何运作我有点困惑.我能够从Python API获取滚动ID:
es = Elasticsearch("myhost")
index = "myindex"
query = {"query":{"match_all":{}}}
response = es.search(index= index, doc_type= "my-doc-type", body= query, search_type= "scan", scroll= "10m")
scroll_id = response["_scroll_id"]
Run Code Online (Sandbox Code Playgroud)
现在我的问题是,这对我有什么用?什么知道滚动ID甚至给我?文档说使用"批量API",但我不知道scoll_id如何影响到这一点,这有点令人困惑.
谁能给出一个简单的例子展示我如何重新索引从这个角度考虑,我已经得到了正确的scroll_id?
python indexing elasticsearch reindex elasticsearch-bulk-api
假设我tag在ElasticSearch索引中有一个类型,具有以下映射:
{
"tag": {
"properties": {
"tag": {"type": "string", "store": "yes"},
"aliases": {"type": "string"}
}
}
}
Run Code Online (Sandbox Code Playgroud)
每个条目都是一个标记,以及该标记的别名数组.这是一个示例项:
{
"word": "weak",
"aliases": ["anemic", "anaemic", "faint", "flimsy"]
}
Run Code Online (Sandbox Code Playgroud)
有时,我想添加带有别名的新标记词,并为现有标记词添加新别名.
添加带有别名的新标记词很容易,它只是一个新文档.但是,如何以理智的方式向现有标记词添加新别名?
我知道我可以只搜索标记字,获取其文档,搜索别名数组中是否已存在别名,如果不是添加,则保存.然而 - 这听起来不是一个好的解决方案.
有没有办法进行批量更新?
使用Elasticsearch 5.5,在发布此批量请求时收到以下错误,无法确定请求的错误.
"type": "illegal_argument_exception",
"reason": "Malformed action/metadata line [3], expected START_OBJECT but found [VALUE_STRING]"
Run Code Online (Sandbox Code Playgroud)
POST http:// localhost:9200/access_log_index/access_log/_bulk
{ "index":{ "_id":11} }
{
"id":11,
"tenant_id":682,
"tenant_name":"kcc",
"user.user_name":"k0772251",
"access_date":"20170821",
"access_time":"02:41:44.123+01:30",
"operation_type":"launch_service",
"remote_host":"qlsso.quicklaunchsso.com",
"user_agent":"Mozilla/5.0 (Linux; Android 7.0; LGLS775 Build/NRD90U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Mobile Safari/537.36",
"browser":"",
"device":"",
"application.application_id":1846,
"application.application_name":"Desire2Learn",
"geoip.ip":"192.95.18.163",
"geoip.country_code":"US",
"geoip.country_name":"United States",
"geoip.region_code":"NJ",
"geoip.region_name":"New Jersey",
"geoip.city":"Newark",
"geoip.zip_code":7102,
"geoip.time_zone":"America/New_York",
"geoip.latitude":40.7355,
"geoip.longitude":-74.1741,
"geoip.metro_code":501
}
{ "index":{"_id":12} }
{
"id":12,
"tenant_id":682,
"tenant_name":"kcc",
"user.user_name":"k0772251",
"access_date":"20170821",
"access_time":"02:50:44.123+01:30",
"operation_type":"launch_service",
"remote_host":"qlsso.quicklaunchsso.com",
"user_agent":"Mozilla/5.0 (Linux; Android 7.0; LGLS775 Build/NRD90U) AppleWebKit/537.36 …Run Code Online (Sandbox Code Playgroud) 我想从我的 MySQL 表运行批量导入到 ES - 对于我的模型 Wine - 在我的生产服务器中。有 1.5M 条记录。
我的模型 - ES gem 的代码:
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
def as_indexed_json(options={})
as_json(only: [:id, :searchable])
end
mapping do
indexes :id, index: :not_analyzed
indexes :searchable
end
Run Code Online (Sandbox Code Playgroud)
在开发中,我运行成功:
bundle exec rake environment elasticsearch:import:model CLASS='Wine' BATCH='100'
Run Code Online (Sandbox Code Playgroud)
但我只有 1000 条记录...
我可以在 prod 中运行类似的命令而没有问题吗?还有其他方法吗?
我注意到我需要使用上面的代码更新模型,否则它将无法工作。问题是如果用户想在批量导入之前更新对象并且在我的模型更改之后,将会出现 ES 问题 (DocumentNotFound) - 逻辑。如果尚未创建,是否可以使用回调来创建 ES 索引,而不是获得 ES 异常?
这样做的正确方法是什么?“elasticsearch:import:model”在后台工作吗?
我对 py-elasticsearch 批量 @Diolor 解决方案的工作原理感到困惑 /sf/ask/1420213931/ -python,但我想使用普通的 es.bulk()
我的代码:
from elasticsearch import Elasticsearch
es = Elasticsearch()
doc = '''\n {"host":"logsqa","path":"/logs","message":"test test","@timestamp":"2014-10-02T10:11:25.980256","tags":["multiline","mydate_0.005"]} \n'''
result = es.bulk(index="logstash-test", doc_type="test", body=doc)
Run Code Online (Sandbox Code Playgroud)
错误是:
No handlers could be found for logger "elasticsearch"
Traceback (most recent call last):
File "./log-parser-perf.py", line 55, in <module>
insertToES()
File "./log-parser-perf.py", line 46, in insertToES
res = es.bulk(index="logstash-test", doc_type="test", body=doc)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch-1.0.0-py2.7.egg/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch-1.0.0-py2.7.egg/elasticsearch/client/__init__.py", line 570, in bulk
params=params, body=self._bulk_body(body))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch-1.0.0-py2.7.egg/elasticsearch/transport.py", …Run Code Online (Sandbox Code Playgroud) 我想在Elasticsearch批量上传中将请求时间设置为20秒或更长时间.默认时间设置为10秒,我的警告信息天数设置为10.006秒.并且,在显示警告之后,执行正在抛出错误
现在,我想为每个从用户输入的请求或默认设置的任何值设置请求超时.
错误信息:
WARNING:elasticsearch:HEAD /opportunityci/predictionsci [status:404 request:0.080s]
validated the index and mapping...!
WARNING:elasticsearch:POST http://192.168.204.154:9200/_bulk [status:N/A request:10.003s]
Traceback (most recent call last):
File "/Users/adaggula/anaconda/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 94, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/connectionpool.py", line 640, in urlopen
_stacktrace=sys.exc_info()[2])
File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/util/retry.py", line 238, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/connectionpool.py", line 595, in urlopen
chunked=chunked)
File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/connectionpool.py", line 395, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/connectionpool.py", line 315, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. …Run Code Online (Sandbox Code Playgroud) python request-timed-out elasticsearch elasticsearch-bulk-api