WoJ*_*WoJ 7 python elasticsearch
我正在使用scroll方法来批量获取大量事件。我不知道如何适当地停止滚动。
我现在正在做的(有效)是检查TransportError哪个信号指示滚动尝试失败:
scanResp= es.search(
index="nessus_all",
doc_type="marker",
body={"query": {"match_all": {}}},
search_type="scan",
scroll="10m"
)
scrollId= scanResp['_scroll_id']
while True:
try:
response = es.scroll(scroll_id=scrollId, scroll= "10m")
# process results
except Exception as e:
log.debug("ended scroll: {e}".format(e=e))
break
# we are done with the search
Run Code Online (Sandbox Code Playgroud)
这会在中产生一个错误/var/log/elasticsearch/security.log:
[2015-02-16 09:36:07,110][DEBUG][action.search.type ] [eu4] [2791] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [eu5][inet[/10.81.147.186:9300]][indices:data/read/search[phase/scan/scroll]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [2791]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:502)
at org.elasticsearch.search.SearchService.executeScan(SearchService.java:236)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:939)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:930)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Run Code Online (Sandbox Code Playgroud)
而且通常看起来不是正确的方法?
根据Elasticsearch的Scroll文档(从5.1版开始):
每次对滚动API的调用都会返回下一批结果,直到没有剩余要返回的结果为止,即hits数组为空。
因此,我认为最好的方法是检查len(response['hits']['hits'])。
一个更具体的例子:
response = es.search(
index='index_name',
body=<your query here>,
scroll='10m'
)
scroll_id = response['_scroll_id']
while len(response['hits']['hits']):
response = es.scroll(scroll_id=scroll_id, scroll='10m')
# process results
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2756 次 |
| 最近记录: |