Python - elasticsearch.exceptions.RequestError

hae*_*ney 2 python elasticsearch

我想在elasticsearch中提取数据

我的功能是这样的:

##Using regex to get the image name.
#it is inefficient to fetch them one by one using  doc['hits']['hits'][n]['_source']['docker_image_short_name']
#because thousands of documents are stored per images
regex = "docker_image_short_name': u'(.+?)'"
pattern=re.compile(regex)
query={
        "query":{
            "bool":{ "must":[{"range":{"@timestamp":{"gt":vulTime}}}] }
        }
    }
page = es.search(index='crawledframe-*', body = query, scroll='1m', size=1000)
sid = page['_scroll_id']
num_page = page['hits']['total']

imglist=[]
while num_page > 0:
    print num_page
    print vulTime
    imgs = re.findall(pattern, str(page))
    imglist += imgs

    page = es.scroll(scroll_id = sid, scroll = '1m')
    num_page = len(page['hits']['hits'])

imglist = list(set(imglist))#remove duplicaton
Run Code Online (Sandbox Code Playgroud)

我只想提取“docker_image_short_name”

但是,我得到了错误(打印结果):

num_page: 2327261
vulTime : 0001-01-01
Traceback (most recent call last):
  File "test.py", line 68, in <module>
    worker_main()
  File "test.py", line 63, in worker_main
    imgnames = recent_crawl_index(es, vulTime)
  File "test.py", line 45, in recent_crawl_index
    page = es.scroll(scroll_id = sid, scroll = '1m')
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 1024, in scroll
    params=params, body=body)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 312, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: <exception str() failed>
Run Code Online (Sandbox Code Playgroud)

我不知道为什么会出现这个错误,因为我在其他代码中使用了相同的逻辑

和 es.search() 没有发生错误......

小智 6

您似乎使用了错误版本的 Elasticsearch DSL。

您需要做的是:

  • 检查您的弹性搜索版本 curl -XGET 'localhost:9200'
  • 然后,您应该将您的 elasticsearch 版本与Elasticsearch DSL兼容版本相匹配。例如,如果您的 Elasticsearch 版本是1.x执行以下操作:

    ——pip uninstall elasticsearch-dsl

    ——pip install "elasticsearch-dsl<2.0.0"