如何在弹性搜索中启用滚动功能

5 python dsl elasticsearch

我有一个由弹性搜索提供服务的 web url api。

  • 我的网址是https://data.emp.com/employees
  • 我的索引中有 50 名员工(数据)
  • 在每个卷轴上,7 名员工将添加 7,14,21..49,50
  • 每个卷轴上将首先出现 7 名员工,然后是 14 名员工,..49,50 名员工
  • 我的 API 下面的 URL 是一次性全部 50 名员工
    def elastic_search():
        """
         Return full search using match_all
        """
        try:
     
            full_search= es.search(index="employees",scroll = '2m',size = 10,body={ "query": {"match_all": {}}})
            hits_search = full_search['hits']['hits']
            return hits_search 
        except Exception as e:
            logger.exception("Error" + str(e))
            raise
Run Code Online (Sandbox Code Playgroud)

我修改了上面的代码,如下所示

        sid =  search["_scroll_id"]
        scroll_size = search['hits']['total']
        scroll_size = scroll_size['value']
        # Start scrolling
        while (scroll_size > 0):

            #print("Scrolling...")
            page = es.scroll(scroll_id = sid, scroll = '1m')

            #print("Hits : ",len(page["hits"]["hits"]))
            
            # Update the scroll ID
            sid = page['_scroll_id']
        
            # Get the number of results that we returned in the last scroll
            scroll_size = len(page['hits']['hits'])
            search_text = page['hits']['hits']
            print (search_text)
Run Code Online (Sandbox Code Playgroud)

我的 api 正在抛出,[]因为我最后一次search_text给出空白。在日志中,每组打印 7 名员工。但我的 web url api 正在加载,最后显示空白页面

请帮助更新返回elastic_search函数中的“hits_search”

Alw*_*nny 2

我想如果你的文档小于 \xe2\x89\xa4 10k , elasticsearch from 和 size会对你有用。但如果你想使用滚动API 那么这就是你所需要的,

\n
    # declare a filter query dict object\n    match_all = {\n        "size": 7,\n        "query": {\n            "match_all": {}\n        }\n    }\n\n    # make a search() request to get all docs in the index\n    resp = client.search(\n        index = \'employees\',\n        body = match_all,\n        scroll = \'2s\' # length of time to keep search context\n    )\n    \n    # process the first 7 documents here from resp\n    # iterate over the document hits for each \'scroll\'\n    for doc in resp[\'hits\'][\'hits\']:\n        print ("\\n", doc[\'_id\'], doc[\'_source\'])\n        doc_count += 1\n        print ("DOC COUNT:", doc_count)\n    \n    # keep track of pass scroll _id\n    old_scroll_id = resp[\'_scroll_id\']\n\n    # use a \'while\' iterator to loop over document \'hits\'\n    while len(resp[\'hits\'][\'hits\']):\n\n        # make a request using the Scroll API\n        resp = client.scroll(\n            scroll_id = old_scroll_id,\n            size = 7,\n            scroll = \'2s\' # length of time to keep search context\n        )\n\n        # iterate over the document hits for each \'scroll\'\n        for doc in resp[\'hits\'][\'hits\']:\n            print ("\\n", doc[\'_id\'], doc[\'_source\'])\n            doc_count += 1\n            print ("DOC COUNT:", doc_count)\n
Run Code Online (Sandbox Code Playgroud)\n

请参阅参考: https://kb.objectrocket.com/elasticsearch/how-to-use-python-to-make-scroll-queries-to-get-all-documents-in-an-elasticsearch-index-752

\n