索引字典列表时出现“只能对某些 xcontent 字节或压缩的 xcontent 字节调用压缩器检测”错误

Aiz*_*aac 5 python bulk-load elasticsearch

这个问题与另一个问题相关:\n如何使用 python 从列表中读取数据并将特定值索引到 Elasticsearch 中?

\n

我编写了一个脚本来读取列表(“虚拟”)并将其索引到 Elasticsearch 中。\n我将该列表转换为字典列表,并使用“批量”API 将其索引到 Elasticsearch 中。\n该脚本用于工作(检查相关问题的附加链接)。但添加“timestamp”和函数“initialize_elasticsearch”后不再起作用。

\n

那么,到底出了什么问题呢?我应该使用 JSON 而不是字典列表吗?

\n

我也尝试过只使用列表中的一本字典。在这种情况下,没有错误,但没有任何内容被索引。

\n

这就是错误

\n

在此输入图像描述

\n

这是清单(虚拟)

\n
[\n    "labels: imagenet_labels.txt ",\n    "Model: efficientnet-edgetpu-S_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 23.1",\n    "Time(ms): 5.7",\n    "Inference: corkscrew, bottle screw",\n    "Score: 0.03125 ",\n    "TPU_temp(\xc2\xb0C): 57.05",\n    "labels: imagenet_labels.txt ",\n    "Model: efficientnet-edgetpu-M_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 29.3",\n    "Time(ms): 10.8",\n    "Inference: dragonfly, darning needle, devil\'s darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",\n    "Score: 0.09375 ",\n    "TPU_temp(\xc2\xb0C): 56.8",\n    "labels: imagenet_labels.txt ",\n    "Model: efficientnet-edgetpu-L_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 45.6",\n    "Time(ms): 31.0",\n    "Inference: pick, plectrum, plectron",\n    "Score: 0.09766 ",\n    "TPU_temp(\xc2\xb0C): 57.55",\n    "labels: imagenet_labels.txt ",\n    "Model: inception_v3_299_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 68.8",\n    "Time(ms): 51.3",\n    "Inference: ringlet, ringlet butterfly",\n    "Score: 0.48047 ",\n    "TPU_temp(\xc2\xb0C): 57.3",\n    "labels: imagenet_labels.txt ",\n    "Model: inception_v4_299_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 121.8",\n    "Time(ms): 101.2",\n    "Inference: admiral",\n    "Score: 0.59375 ",\n    "TPU_temp(\xc2\xb0C): 57.05",\n    "labels: imagenet_labels.txt ",\n    "Model: inception_v2_224_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 34.3",\n    "Time(ms): 16.6",\n    "Inference: lycaenid, lycaenid butterfly",\n    "Score: 0.41406 ",\n    "TPU_temp(\xc2\xb0C): 57.3",\n    "labels: imagenet_labels.txt ",\n    "Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 14.4",\n    "Time(ms): 3.3",\n    "Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea",\n    "Score: 0.36328 ",\n    "TPU_temp(\xc2\xb0C): 57.3",\n    "labels: imagenet_labels.txt ",\n    "Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 14.5",\n    "Time(ms): 3.0",\n    "Inference: bow tie, bow-tie, bowtie",\n    "Score: 0.33984 ",\n    "TPU_temp(\xc2\xb0C): 57.3",\n    "labels: imagenet_labels.txt ",\n    "Model: inception_v1_224_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 21.2",\n    "Time(ms): 3.6",\n    "Inference: pick, plectrum, plectron",\n    "Score: 0.17578 ",\n    "TPU_temp(\xc2\xb0C): 57.3",\n]\n
Run Code Online (Sandbox Code Playgroud)\n

这是脚本

\n
import elasticsearch6  \nfrom elasticsearch6 import Elasticsearch, helpers\nimport datetime\nimport re\n\n\n\nES_DEV_HOST = "http://localhost:9200/"\nINDEX_NAME = "coral_ia" #name of index\nDOC_TYPE = \'coral_edge\'  #type of data\n\n\n\n##This is the list\ndummy = [\'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: efficientnet-edgetpu-S_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 23.1\\n\', \'Time(ms): 5.7\\n\', \'\\n\', \'\\n\', \'Inference: corkscrew, bottle screw\\n\', \'Score: 0.03125 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.05\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: efficientnet-edgetpu-M_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 29.3\\n\', \'Time(ms): 10.8\\n\', \'\\n\', \'\\n\', "Inference: dragonfly, darning needle, devil\'s darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk\\n", \'Score: 0.09375 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 56.8\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: efficientnet-edgetpu-L_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 45.6\\n\', \'Time(ms): 31.0\\n\', \'\\n\', \'\\n\', \'Inference: pick, plectrum, plectron\\n\', \'Score: 0.09766 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.55\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: inception_v3_299_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 68.8\\n\', \'Time(ms): 51.3\\n\', \'\\n\', \'\\n\', \'Inference: ringlet, ringlet butterfly\\n\', \'Score: 0.48047 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: inception_v4_299_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 121.8\\n\', \'Time(ms): 101.2\\n\', \'\\n\', \'\\n\', \'Inference: admiral\\n\', \'Score: 0.59375 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.05\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: inception_v2_224_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 34.3\\n\', \'Time(ms): 16.6\\n\', \'\\n\', \'\\n\', \'Inference: lycaenid, lycaenid butterfly\\n\', \'Score: 0.41406 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 14.4\\n\', \'Time(ms): 3.3\\n\', \'\\n\', \'\\n\', \'Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea\\n\', \'Score: 0.36328 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 14.5\\n\', \'Time(ms): 3.0\\n\', \'\\n\', \'\\n\', \'Inference: bow tie, bow-tie, bowtie\\n\', \'Score: 0.33984 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: inception_v1_224_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 21.2\\n\', \'Time(ms): 3.6\\n\', \'\\n\', \'\\n\', \'Inference: pick, plectrum, plectron\\n\', \'Score: 0.17578 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\']\n\n#This is to clean data and filter some values\nregex = re.compile(r\'(\\w+)\\((.+)\\):\\s(.*)|(\\w+:)\\s(.*)\')\nmatch_regex = list(filter(regex.match, dummy))\nmatch = [line.strip(\'\\n\') for line in match_regex]   \nprint("match list", match, "\\n")\n\n\n##Converts the list into a list of dictionaries\ngroups = [{}]\n\nfor line in match:\n    key, value = line.split(": ", 1)\n    if key == "labels":\n        if groups[-1]:\n            groups.append({})\n    groups[-1][key] = value\n\n\n\n"""\nInitialize Elasticsearch by server\'s IP\'\n"""\ndef initialize_elasticsearch():\n    n = 0\n    while n <= 10:\n        try:\n            es = Elasticsearch(ES_DEV_HOST)\n            print("Initializing Elasticsearch...")\n            return es\n        except elasticsearch6.exceptions.ConnectionTimeout as e:  ###elasticsearch\n            print(e)\n            n += 1\n            continue\n    raise Exception\n\n\n\n"""\nCreate an index in Elasticsearch if one isn\'t already there\n"""\ndef initialize_mapping(es):\n    mapping_classification = {\n        \'properties\': {\n            \'timestamp\': {\'type\': \'date\'},\n            #\'type\': {\'type\':\'keyword\'}, <--- I have removed this \n            \'labels\': {\'type\': \'keyword\'},\n            \'Model\': {\'type\': \'keyword\'},\n            \'Image\': {\'type\': \'keyword\'},\n            \'Time(ms)\': {\'type\': \'short\'},\n            \'Inference\': {\'type\': \'text\'},\n            \'Score\': {\'type\': \'short\'},\n            \'TPU_temp(\xc2\xb0C)\': {\'type\': \'short\'}\n        }\n    }\n    print("Initializing the mapping ...")  \n    if not es.indices.exists(INDEX_NAME):\n        es.indices.create(INDEX_NAME)\n        es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=INDEX_NAME)\n        \n\n\n\ndef generate_actions():\n    actions = {\n        \'_index\': INDEX_NAME,\n        \'timestamp\': str(datetime.datetime.utcnow().strftime("%Y-%m-%d"\'T\'"%H:%M:%S")),\n        \'_type\': DOC_TYPE,\n        \'_source\': groups\n        }\n\n    yield actions\n    print("Generating actions ...")\n    #print("actions:", actions)\n    #print(type(actions), "\\n")\n\n\n\ndef main():\n    es=initialize_elasticsearch()\n    initialize_mapping(es)  \n    \n    try:\n        res=helpers.bulk(client=es, index = INDEX_NAME, actions = generate_actions())\n        print ("\\nhelpers.bulk() RESPONSE:", res)\n        print ("RESPONSE TYPE:", type(res))\n        \n    except Exception as err:\n        print("\\nhelpers.bulk() ERROR:", err)\n\n\nif __name__ == "__main__":\n    main()\n\n
Run Code Online (Sandbox Code Playgroud)\n

这是仅使用 1 个字典进行测试时的代码

\n
regex = re.compile(r\'(\\w+)\\((.+)\\):\\s(.*)|(\\w+:)\\s(.*)\')\nmatch_regex = list(filter(regex.match, dummy))\nmatch = [line.rstrip(\'\\n\') for line in match_regex]   #quita los saltos de linea\n#print("match list", match, "\\n")\n\n\nfeatures_wanted=\'ModelImageTime(ms)InferenceScoreTPU_temp(\xc2\xb0C)\'\nmatch_out = {i.replace(\' \',\'\').split(\':\')[0]:i.replace(\' \',\'\').split(\':\')[1] for i in match if i.replace(\' \',\'\').split(\':\')[0] in features_wanted}\n
Run Code Online (Sandbox Code Playgroud)\n

- - - - - - - - - -编辑 - - - - - - - - - - - - -

\n

没有错误,但没有打印“正在生成操作...”。

\n

在此输入图像描述

\n

在此输入图像描述

\n

这是映射

\n

在此输入图像描述

\n

当我想查看数据是否已建立索引时会出现此信息

\n

在此输入图像描述

\n

数据似乎已被索引......

\n

在此输入图像描述

\n

- - - - - - - - - - - 编辑 - - - - - - - - - - - -

\n

我修改了generate_actions

\n
def generate_actions():\n    return[{\n        \'_index\': INDEX_NAME,\n        \'_type\': DOC_TYPE,\n        \'_source\': {\n            "any": doc,\n            "@timestamp": str(datetime.datetime.utcnow().strftime("%Y-%m-%d"\'T\'"%H:%M:%S")),}\n        }\n        for doc in groups]\n
Run Code Online (Sandbox Code Playgroud)\n

在此输入图像描述

\n

在此输入图像描述

\n

Joe*_*ook 5

这个有点神秘的错误消息告诉您需要将单个对象(而不是它们的数组)传递给批量助手。

所以你需要generate_actions像这样重写你的 fn :

def generate_actions():
    return [{
        'timestamp': str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
        '_index': INDEX_NAME,
        '_type': DOC_TYPE,
        '_source': doc
    } for doc in groups]      # <----- note the for loop here. `_source` needs
                              # to be the doc, not the whole groups list

    print("Generating actions ...")
Run Code Online (Sandbox Code Playgroud)

另外,我建议在构造以下内容时从键值对中删除尾随空格groups

groups[-1][key] = value.strip()
Run Code Online (Sandbox Code Playgroud)