Aiz*_*aac 5 python bulk-load elasticsearch
这个问题与另一个问题相关:\n如何使用 python 从列表中读取数据并将特定值索引到 Elasticsearch 中?
\n我编写了一个脚本来读取列表(“虚拟”)并将其索引到 Elasticsearch 中。\n我将该列表转换为字典列表,并使用“批量”API 将其索引到 Elasticsearch 中。\n该脚本用于工作(检查相关问题的附加链接)。但添加“timestamp”和函数“initialize_elasticsearch”后不再起作用。
\n那么,到底出了什么问题呢?我应该使用 JSON 而不是字典列表吗?
\n我也尝试过只使用列表中的一本字典。在这种情况下,没有错误,但没有任何内容被索引。
\n这就是错误
\n\n这是清单(虚拟)
\n[\n "labels: imagenet_labels.txt ",\n "Model: efficientnet-edgetpu-S_quant_edgetpu.tflite ",\n "Image: insect.jpg ",\n "Time(ms): 23.1",\n "Time(ms): 5.7",\n "Inference: corkscrew, bottle screw",\n "Score: 0.03125 ",\n "TPU_temp(\xc2\xb0C): 57.05",\n "labels: imagenet_labels.txt ",\n "Model: efficientnet-edgetpu-M_quant_edgetpu.tflite ",\n "Image: insect.jpg ",\n "Time(ms): 29.3",\n "Time(ms): 10.8",\n "Inference: dragonfly, darning needle, devil\'s darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",\n "Score: 0.09375 ",\n "TPU_temp(\xc2\xb0C): 56.8",\n "labels: imagenet_labels.txt ",\n "Model: efficientnet-edgetpu-L_quant_edgetpu.tflite ",\n "Image: insect.jpg ",\n "Time(ms): 45.6",\n "Time(ms): 31.0",\n "Inference: pick, plectrum, plectron",\n "Score: 0.09766 ",\n "TPU_temp(\xc2\xb0C): 57.55",\n "labels: imagenet_labels.txt ",\n "Model: inception_v3_299_quant_edgetpu.tflite ",\n "Image: insect.jpg ",\n "Time(ms): 68.8",\n "Time(ms): 51.3",\n "Inference: ringlet, ringlet butterfly",\n "Score: 0.48047 ",\n "TPU_temp(\xc2\xb0C): 57.3",\n "labels: imagenet_labels.txt ",\n "Model: inception_v4_299_quant_edgetpu.tflite ",\n "Image: insect.jpg ",\n "Time(ms): 121.8",\n "Time(ms): 101.2",\n "Inference: admiral",\n "Score: 0.59375 ",\n "TPU_temp(\xc2\xb0C): 57.05",\n "labels: imagenet_labels.txt ",\n "Model: inception_v2_224_quant_edgetpu.tflite ",\n "Image: insect.jpg ",\n "Time(ms): 34.3",\n "Time(ms): 16.6",\n "Inference: lycaenid, lycaenid butterfly",\n "Score: 0.41406 ",\n "TPU_temp(\xc2\xb0C): 57.3",\n "labels: imagenet_labels.txt ",\n "Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite ",\n "Image: insect.jpg ",\n "Time(ms): 14.4",\n "Time(ms): 3.3",\n "Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea",\n "Score: 0.36328 ",\n "TPU_temp(\xc2\xb0C): 57.3",\n "labels: imagenet_labels.txt ",\n "Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite ",\n "Image: insect.jpg ",\n "Time(ms): 14.5",\n "Time(ms): 3.0",\n "Inference: bow tie, bow-tie, bowtie",\n "Score: 0.33984 ",\n "TPU_temp(\xc2\xb0C): 57.3",\n "labels: imagenet_labels.txt ",\n "Model: inception_v1_224_quant_edgetpu.tflite ",\n "Image: insect.jpg ",\n "Time(ms): 21.2",\n "Time(ms): 3.6",\n "Inference: pick, plectrum, plectron",\n "Score: 0.17578 ",\n "TPU_temp(\xc2\xb0C): 57.3",\n]\nRun Code Online (Sandbox Code Playgroud)\n这是脚本
\nimport elasticsearch6 \nfrom elasticsearch6 import Elasticsearch, helpers\nimport datetime\nimport re\n\n\n\nES_DEV_HOST = "http://localhost:9200/"\nINDEX_NAME = "coral_ia" #name of index\nDOC_TYPE = \'coral_edge\' #type of data\n\n\n\n##This is the list\ndummy = [\'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: efficientnet-edgetpu-S_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 23.1\\n\', \'Time(ms): 5.7\\n\', \'\\n\', \'\\n\', \'Inference: corkscrew, bottle screw\\n\', \'Score: 0.03125 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.05\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: efficientnet-edgetpu-M_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 29.3\\n\', \'Time(ms): 10.8\\n\', \'\\n\', \'\\n\', "Inference: dragonfly, darning needle, devil\'s darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk\\n", \'Score: 0.09375 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 56.8\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: efficientnet-edgetpu-L_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 45.6\\n\', \'Time(ms): 31.0\\n\', \'\\n\', \'\\n\', \'Inference: pick, plectrum, plectron\\n\', \'Score: 0.09766 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.55\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: inception_v3_299_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 68.8\\n\', \'Time(ms): 51.3\\n\', \'\\n\', \'\\n\', \'Inference: ringlet, ringlet butterfly\\n\', \'Score: 0.48047 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: inception_v4_299_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 121.8\\n\', \'Time(ms): 101.2\\n\', \'\\n\', \'\\n\', \'Inference: admiral\\n\', \'Score: 0.59375 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.05\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: inception_v2_224_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 34.3\\n\', \'Time(ms): 16.6\\n\', \'\\n\', \'\\n\', \'Inference: lycaenid, lycaenid butterfly\\n\', \'Score: 0.41406 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 14.4\\n\', \'Time(ms): 3.3\\n\', \'\\n\', \'\\n\', \'Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea\\n\', \'Score: 0.36328 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 14.5\\n\', \'Time(ms): 3.0\\n\', \'\\n\', \'\\n\', \'Inference: bow tie, bow-tie, bowtie\\n\', \'Score: 0.33984 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\', \'labels: imagenet_labels.txt \\n\', \'\\n\', \'Model: inception_v1_224_quant_edgetpu.tflite \\n\', \'\\n\', \'Image: insect.jpg \\n\', \'\\n\', \'*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*\\n\', \'Time(ms): 21.2\\n\', \'Time(ms): 3.6\\n\', \'\\n\', \'\\n\', \'Inference: pick, plectrum, plectron\\n\', \'Score: 0.17578 \\n\', \'\\n\', \'TPU_temp(\xc2\xb0C): 57.3\\n\', \'##################################### \\n\', \'\\n\']\n\n#This is to clean data and filter some values\nregex = re.compile(r\'(\\w+)\\((.+)\\):\\s(.*)|(\\w+:)\\s(.*)\')\nmatch_regex = list(filter(regex.match, dummy))\nmatch = [line.strip(\'\\n\') for line in match_regex] \nprint("match list", match, "\\n")\n\n\n##Converts the list into a list of dictionaries\ngroups = [{}]\n\nfor line in match:\n key, value = line.split(": ", 1)\n if key == "labels":\n if groups[-1]:\n groups.append({})\n groups[-1][key] = value\n\n\n\n"""\nInitialize Elasticsearch by server\'s IP\'\n"""\ndef initialize_elasticsearch():\n n = 0\n while n <= 10:\n try:\n es = Elasticsearch(ES_DEV_HOST)\n print("Initializing Elasticsearch...")\n return es\n except elasticsearch6.exceptions.ConnectionTimeout as e: ###elasticsearch\n print(e)\n n += 1\n continue\n raise Exception\n\n\n\n"""\nCreate an index in Elasticsearch if one isn\'t already there\n"""\ndef initialize_mapping(es):\n mapping_classification = {\n \'properties\': {\n \'timestamp\': {\'type\': \'date\'},\n #\'type\': {\'type\':\'keyword\'}, <--- I have removed this \n \'labels\': {\'type\': \'keyword\'},\n \'Model\': {\'type\': \'keyword\'},\n \'Image\': {\'type\': \'keyword\'},\n \'Time(ms)\': {\'type\': \'short\'},\n \'Inference\': {\'type\': \'text\'},\n \'Score\': {\'type\': \'short\'},\n \'TPU_temp(\xc2\xb0C)\': {\'type\': \'short\'}\n }\n }\n print("Initializing the mapping ...") \n if not es.indices.exists(INDEX_NAME):\n es.indices.create(INDEX_NAME)\n es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=INDEX_NAME)\n \n\n\n\ndef generate_actions():\n actions = {\n \'_index\': INDEX_NAME,\n \'timestamp\': str(datetime.datetime.utcnow().strftime("%Y-%m-%d"\'T\'"%H:%M:%S")),\n \'_type\': DOC_TYPE,\n \'_source\': groups\n }\n\n yield actions\n print("Generating actions ...")\n #print("actions:", actions)\n #print(type(actions), "\\n")\n\n\n\ndef main():\n es=initialize_elasticsearch()\n initialize_mapping(es) \n \n try:\n res=helpers.bulk(client=es, index = INDEX_NAME, actions = generate_actions())\n print ("\\nhelpers.bulk() RESPONSE:", res)\n print ("RESPONSE TYPE:", type(res))\n \n except Exception as err:\n print("\\nhelpers.bulk() ERROR:", err)\n\n\nif __name__ == "__main__":\n main()\n\nRun Code Online (Sandbox Code Playgroud)\n这是仅使用 1 个字典进行测试时的代码
\nregex = re.compile(r\'(\\w+)\\((.+)\\):\\s(.*)|(\\w+:)\\s(.*)\')\nmatch_regex = list(filter(regex.match, dummy))\nmatch = [line.rstrip(\'\\n\') for line in match_regex] #quita los saltos de linea\n#print("match list", match, "\\n")\n\n\nfeatures_wanted=\'ModelImageTime(ms)InferenceScoreTPU_temp(\xc2\xb0C)\'\nmatch_out = {i.replace(\' \',\'\').split(\':\')[0]:i.replace(\' \',\'\').split(\':\')[1] for i in match if i.replace(\' \',\'\').split(\':\')[0] in features_wanted}\nRun Code Online (Sandbox Code Playgroud)\n- - - - - - - - - -编辑 - - - - - - - - - - - - -
\n没有错误,但没有打印“正在生成操作...”。
\n\n\n这是映射
\n\n当我想查看数据是否已建立索引时会出现此信息
\n\n数据似乎已被索引......
\n\n- - - - - - - - - - - 编辑 - - - - - - - - - - - -
\n我修改了generate_actions
def generate_actions():\n return[{\n \'_index\': INDEX_NAME,\n \'_type\': DOC_TYPE,\n \'_source\': {\n "any": doc,\n "@timestamp": str(datetime.datetime.utcnow().strftime("%Y-%m-%d"\'T\'"%H:%M:%S")),}\n }\n for doc in groups]\nRun Code Online (Sandbox Code Playgroud)\n\n\n
这个有点神秘的错误消息告诉您需要将单个对象(而不是它们的数组)传递给批量助手。
所以你需要generate_actions像这样重写你的 fn :
def generate_actions():
return [{
'timestamp': str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
'_index': INDEX_NAME,
'_type': DOC_TYPE,
'_source': doc
} for doc in groups] # <----- note the for loop here. `_source` needs
# to be the doc, not the whole groups list
print("Generating actions ...")
Run Code Online (Sandbox Code Playgroud)
另外,我建议在构造以下内容时从键值对中删除尾随空格groups:
groups[-1][key] = value.strip()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
15428 次 |
| 最近记录: |