Neo4j使用带有空值的MERGE

Por*_*jaz 3 csv null neo4j cypher

我知道这个问题之前已被问过几次,但答案并没有解决我的问题.我正在尝试执行此查询:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
MERGE (a:Address1 {address_name1:line1.address1})
Run Code Online (Sandbox Code Playgroud)

但是我得到了错误:Cannot merge node using null property value for address_name1.

其他人建议使用:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
MERGE (a:Address1)
 ON CREATE SET a.address_name1=line1.address1
 ON MATCH SET a.address_name1=line1.address1
Run Code Online (Sandbox Code Playgroud)

但是,如果节点具有多个属性,则此解决方案有效.就我而言,它只有address_name1财产.

有没有办法解决这个问题,比如在查询MERGE或其他解决方案之前用查询中的单词替换空值?

Fra*_*eau 8

如果没有地址,你真的需要创建Address节点吗?

您可以使用WITH/ 来过滤CSV中的行WHERE:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
WITH line1
WHERE NOT line1.address1 IS NULL
MERGE (a:Address1 {address_name1:line1.address1})
Run Code Online (Sandbox Code Playgroud)

否则,如果要创建表示"未知"地址的节点,可以使用coalesce()替换默认值:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
MERGE (a:Address1 {address_name1: coalesce(line1.address1, "Unknown")})
Run Code Online (Sandbox Code Playgroud)


Vic*_*art 6

您好:我发布这个相当广泛的答案是因为我最近在尝试将这些数据加载到 Neo4j (neo4j 3.3.4) 时,在处理 CSV 文件中存在的 NULL(缺失)值时遇到了令人惊讶的困难。

\n\n

我提出三个解决方案。

\n\n

我正在使用 Cycli (cycli 0.7.6) CLI,通过 pip 安装在 Arch Linux x86_64 系统上的 Python 3.5 venv 中。

\n\n

我的 CSV 文件 (grinding_metabolites.csv) 是:

\n\n
name,abbreviation,kegg_entry\n\xce\xb1-D-glucose,GLC,C00267\nglucose 6-phosphate,G6P,C00668\nfructose 6-phosphate,F6P,C05345\n"fructose 1,6-bisphosphate",FBP,C05378\ndihydroxyacetone phosphate,DHAP,C00111\nD-glyceraldehyde 3-phosphate,,C00118\n"1,3-bisphosphoglycerate","1,3-BPG",C00236\n3-phosphoglycerate,3PG,C00197\n2-phosphoglycerate,2PG,C00631\nphosphoenolpyruvate,PEP,C00074\npyruvate,,C00022\n
Run Code Online (Sandbox Code Playgroud)\n\n

这些数据是通过 psql /COPY ... 命令从 PostgreSQL 表复制的,在“name”字段上有“UNIQUE NOT NULL”约束。

\n\n

在调查了谷歌等之后,我进行了以下三个实验。实验2和实验3基本相同。

\n\n

我相信实验 2 中所示的方法是最佳解决方案,因为 COALESCE 语句包含在 MERGE 语句中。

\n\n

我得出这个结论的原因是实验 2 使用“局部”变量,而不是返回“全局”变量(实验 3),从而最大限度地减少对重用变量名称的意外后果。

\n\n

我按如下方式加载 Cypher 脚本:

\n\n
cat glycolysis_script.cypher |  cypher-shell -u victoria -p <your_password>\n
Run Code Online (Sandbox Code Playgroud)\n\n

** 实验 1**

\n\n

参考:http://markhneedham.com/blog/2014/08/22/neo4j-load-csv-handling-empty-columns/

\n\n

这个解决方案(Mark Needham 的)非常聪明:它创建包含所有非 NULL 属性的节点,例如

\n\n

<id>: 0 abbreviation: GLC kegg_entry: C00267 name: \xce\xb1-D-glucose\n <id>: 10 kegg_entry: C00022 name: pyruvate

\n\n
USING PERIODIC COMMIT\nLOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row\nMERGE (a:GlycolysisMetabolites {name: row.name})\nFOREACH(ignoreMe IN CASE WHEN row.abbreviation <> "" THEN [1] ELSE [] END | SET a.abbreviation = row.abbreviation)\nFOREACH(ignoreMe IN CASE WHEN row.kegg_entry <> "" THEN [1] ELSE [] END | SET a.kegg_entry = row.kegg_entry)\n// With "USING PERIODIC COMMIT",\n// RETURN a;\n// throws this error: "Unknown value type: STRUCT"\n// ... so, use this:\nRETURN a.name, a.abbreviation, a.kegg_entry;\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出:

\n\n
$ cat glycolysis.cypher |  cypher-shell -u victoria -p <your_password>\n\na.name, a.abbreviation, a.kegg_entry\n"\xce\xb1-D-glucose", "GLC", "C00267"\n"glucose 6-phosphate", "G6P", "C00668"\n"fructose 6-phosphate", "F6P", "C05345"\n"fructose 1,6-bisphosphate", "FBP", "C05378"\n"dihydroxyacetone phosphate", "DHAP", "C00111"\n"D-glyceraldehyde 3-phosphate", NULL, "C00118"\n"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"\n"3-phosphoglycerate", "3PG", "C00197"\n"2-phosphoglycerate", "2PG", "C00631"\n"phosphoenolpyruvate", "PEP", "C00074"\n"pyruvate", NULL, "C00022"\n
Run Code Online (Sandbox Code Playgroud)\n\n

但是,您无法对包含 NULL 值(此处:“缩写”)的属性设置自己的 MERGE 规范 - 原因是您无法对 NULL 属性值进行 MERGE。

\n\n

作品:

\n\n
MERGE (a:GlycolysisMetabolites {name: row.name})\n
Run Code Online (Sandbox Code Playgroud)\n\n

失败(“无法使用缩写的空属性值合并节点”):

\n\n
MERGE (a:GlycolysisMetabolites {name: row.name, abbreviation:row.abbreviation})\nMERGE (a:GlycolysisMetabolites {name: row.name, abbreviation:row.abbreviation, kegg_entry:row.kegg_entry})\n
Run Code Online (Sandbox Code Playgroud)\n\n

实验2

\n\n

参考:Neo4j 使用带有空值的 MERGE

\n\n

在这里,我设置一个空字符串 (\'\') 作为 CSV 文件中存在的 NULL 值的替换;你可以使用任何你想要的东西;例如:“未定义”、“空”、...

\n\n
USING PERIODIC COMMIT\nLOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row\n// MERGE (a:GlycolysisMetabolites {name: row.name})\nMERGE (a:GlycolysisMetabolites {name: row.name, abbreviation:COALESCE(row.abbreviation, \'\'), kegg_entry:COALESCE(row.kegg_entry, \'\')})\n// With "USING PERIODIC COMMIT",\n// RETURN a;\n// throws this error: "Unknown value type: STRUCT"\n// ... so, use this:\nRETURN a.name, a.abbreviation, a.kegg_entry;\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出:

\n\n
$ cat glycolysis.cypher |  cypher-shell -u victoria -p <your_password>\n\na.name, a.abbreviation, a.kegg_entry\n"\xce\xb1-D-glucose", "GLC", "C00267"\n"glucose 6-phosphate", "G6P", "C00668"\n"fructose 6-phosphate", "F6P", "C05345"\n"fructose 1,6-bisphosphate", "FBP", "C05378"\n"dihydroxyacetone phosphate", "DHAP", "C00111"\n"D-glyceraldehyde 3-phosphate", "", "C00118"\n"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"\n"3-phosphoglycerate", "3PG", "C00197"\n"2-phosphoglycerate", "2PG", "C00631"\n"phosphoenolpyruvate", "PEP", "C00074"\n"pyruvate", "", "C00022"\n
Run Code Online (Sandbox Code Playgroud)\n\n

实验3

\n\n

参考:

\n\n

Neo4j 使用带有空值的 MERGE

\n\n

https://github.com/neo4j/neo4j/issues/2521

\n\n

这也有效,但由于 COALESCE 语句位于 MERGE 语句之外,我担心如果这些变量名称在其他地方重用,则 RETURN 语句返回的数据可能会导致问题。作为解决方法,\n我添加了一个前缀 (a_) 作为准 UID,但我认为上面实验 2 中的解决方案是更好的方法。

\n\n
USING PERIODIC COMMIT\nLOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row\nWITH\n  COALESCE(CASE row.name WHEN \'\' THEN null ELSE row.name END, \'\') AS a_name,\n  COALESCE(CASE row.abbreviation WHEN \'\' THEN null ELSE row.abbreviation END, \'\') AS a_abbreviation,\n  COALESCE(CASE row.kegg_entry WHEN \'\' THEN null ELSE row.kegg_entry END, \'\') AS a_kegg_entry\nMERGE (a:GlycolysisMetabolites {name:a_name, abbreviation:a_abbreviation, kegg_entry:a_kegg_entry})\n// Note: RETURN can only be used at the end of the query\nRETURN a_name, a_abbreviation, a_kegg_entry;\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出:

\n\n
$ cat glycolysis.cypher |  cypher-shell -u victoria -p <your_password>\n\na_name, a_abbreviation, a_kegg_entry\n"\xce\xb1-D-glucose", "GLC", "C00267"\n"glucose 6-phosphate", "G6P", "C00668"\n"fructose 6-phosphate", "F6P", "C05345"\n"fructose 1,6-bisphosphate", "FBP", "C05378"\n"dihydroxyacetone phosphate", "DHAP", "C00111"\n"D-glyceraldehyde 3-phosphate", "", "C00118"\n"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"\n"3-phosphoglycerate", "3PG", "C00197"\n"2-phosphoglycerate", "2PG", "C00631"\n"phosphoenolpyruvate", "PEP", "C00074"\n"pyruvate", "", "C00022"\n
Run Code Online (Sandbox Code Playgroud)\n\n

有关此主题/问题的其他 StackOverflow 讨论:\n https://stackoverflow.com/search?tab=votes&q=Neo4j%20use%20MERGE%20with%20null%20value

\n\n
\n\n

附录

\n\n

参考(例如):带有空单元格的 Neo4j CSV 文件加载

\n\n

这“有效”,但如果任何字段包含 NULL 值,则会跳过节点的创建:

\n\n
USING PERIODIC COMMIT\nLOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row\nFOREACH (\n    x IN CASE WHEN row.abbreviation IS NULL OR row.kegg_entry IS NULL THEN [] ELSE [1] END |\n    MERGE (a:GlycolysisMetabolites {name: row.name, abbreviation: row.abbreviation, kegg_entry: row.kegg_entry})\n    )\nRETURN row.name, row.abbreviation, row.kegg_entry;\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出:

\n\n
$ cat glycolysis.cypher |  cypher-shell -u victoria -p <password>\n\nrow.name, row.abbreviation, row.kegg_entry\n"\xce\xb1-D-glucose", "GLC", "C00267"\n"glucose 6-phosphate", "G6P", "C00668"\n"fructose 6-phosphate", "F6P", "C05345"\n"fructose 1,6-bisphosphate", "FBP", "C05378"\n"dihydroxyacetone phosphate", "DHAP", "C00111"\n"D-glyceraldehyde 3-phosphate", NULL, "C00118"\n"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"\n"3-phosphoglycerate", "3PG", "C00197"\n"2-phosphoglycerate", "2PG", "C00631"\n"phosphoenolpyruvate", "PEP", "C00074"\n"pyruvate", NULL, "C00022"\n
Run Code Online (Sandbox Code Playgroud)\n\n

请注意,在 Neo4j 浏览器中,仅创建 9 个(不是 11 个)节点:不会创建“D-甘油醛 3-磷酸”和“丙酮酸”的节点。

\n