hae*_*ney 2 python elasticsearch
我想在elasticsearch中提取数据
我的功能是这样的:
##Using regex to get the image name.
#it is inefficient to fetch them one by one using doc['hits']['hits'][n]['_source']['docker_image_short_name']
#because thousands of documents are stored per images
regex = "docker_image_short_name': u'(.+?)'"
pattern=re.compile(regex)
query={
"query":{
"bool":{ "must":[{"range":{"@timestamp":{"gt":vulTime}}}] }
}
}
page = es.search(index='crawledframe-*', body = query, scroll='1m', size=1000)
sid = page['_scroll_id']
num_page = page['hits']['total']
imglist=[]
while num_page > 0:
print num_page
print vulTime
imgs = re.findall(pattern, str(page))
imglist += imgs
page = es.scroll(scroll_id = sid, scroll = '1m')
num_page = len(page['hits']['hits'])
imglist = list(set(imglist))#remove duplicaton
Run Code Online (Sandbox Code Playgroud)
我只想提取“docker_image_short_name”
但是,我得到了错误(打印结果):
num_page: 2327261
vulTime : 0001-01-01
Traceback (most recent call last):
File "test.py", line 68, in <module>
worker_main()
File "test.py", line 63, in worker_main
imgnames = recent_crawl_index(es, vulTime)
File "test.py", line 45, in recent_crawl_index
page = es.scroll(scroll_id = sid, scroll = '1m')
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 1024, in scroll
params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: <exception str() failed>
Run Code Online (Sandbox Code Playgroud)
我不知道为什么会出现这个错误,因为我在其他代码中使用了相同的逻辑
和 es.search() 没有发生错误......
小智 6
您似乎使用了错误版本的 Elasticsearch DSL。
您需要做的是:
curl -XGET 'localhost:9200'然后,您应该将您的 elasticsearch 版本与Elasticsearch DSL的兼容版本相匹配。例如,如果您的 Elasticsearch 版本是1.x执行以下操作:
——pip uninstall elasticsearch-dsl
——pip install "elasticsearch-dsl<2.0.0"
| 归档时间: |
|
| 查看次数: |
6157 次 |
| 最近记录: |