Por*_*jaz 3 csv null neo4j cypher
我知道这个问题之前已被问过几次,但答案并没有解决我的问题.我正在尝试执行此查询:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
MERGE (a:Address1 {address_name1:line1.address1})
Run Code Online (Sandbox Code Playgroud)
但是我得到了错误:Cannot merge node using null property value for address_name1
.
其他人建议使用:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
MERGE (a:Address1)
ON CREATE SET a.address_name1=line1.address1
ON MATCH SET a.address_name1=line1.address1
Run Code Online (Sandbox Code Playgroud)
但是,如果节点具有多个属性,则此解决方案有效.就我而言,它只有address_name1
财产.
有没有办法解决这个问题,比如在查询MERGE
或其他解决方案之前用查询中的单词替换空值?
如果没有地址,你真的需要创建Address
节点吗?
您可以使用WITH
/ 来过滤CSV中的行WHERE
:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
WITH line1
WHERE NOT line1.address1 IS NULL
MERGE (a:Address1 {address_name1:line1.address1})
Run Code Online (Sandbox Code Playgroud)
否则,如果要创建表示"未知"地址的节点,可以使用coalesce()
替换默认值:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///C:/Users/Zona5/Documents/Neo4j/checkIntel/import/personaldata.csv' AS line1
MERGE (a:Address1 {address_name1: coalesce(line1.address1, "Unknown")})
Run Code Online (Sandbox Code Playgroud)
您好:我发布这个相当广泛的答案是因为我最近在尝试将这些数据加载到 Neo4j (neo4j 3.3.4) 时,在处理 CSV 文件中存在的 NULL(缺失)值时遇到了令人惊讶的困难。
\n\n我提出三个解决方案。
\n\n我正在使用 Cycli (cycli 0.7.6) CLI,通过 pip 安装在 Arch Linux x86_64 系统上的 Python 3.5 venv 中。
\n\n我的 CSV 文件 (grinding_metabolites.csv) 是:
\n\nname,abbreviation,kegg_entry\n\xce\xb1-D-glucose,GLC,C00267\nglucose 6-phosphate,G6P,C00668\nfructose 6-phosphate,F6P,C05345\n"fructose 1,6-bisphosphate",FBP,C05378\ndihydroxyacetone phosphate,DHAP,C00111\nD-glyceraldehyde 3-phosphate,,C00118\n"1,3-bisphosphoglycerate","1,3-BPG",C00236\n3-phosphoglycerate,3PG,C00197\n2-phosphoglycerate,2PG,C00631\nphosphoenolpyruvate,PEP,C00074\npyruvate,,C00022\n
Run Code Online (Sandbox Code Playgroud)\n\n这些数据是通过 psql /COPY ... 命令从 PostgreSQL 表复制的,在“name”字段上有“UNIQUE NOT NULL”约束。
\n\n在调查了谷歌等之后,我进行了以下三个实验。实验2和实验3基本相同。
\n\n我相信实验 2 中所示的方法是最佳解决方案,因为 COALESCE 语句包含在 MERGE 语句中。
\n\n我得出这个结论的原因是实验 2 使用“局部”变量,而不是返回“全局”变量(实验 3),从而最大限度地减少对重用变量名称的意外后果。
\n\n我按如下方式加载 Cypher 脚本:
\n\ncat glycolysis_script.cypher | cypher-shell -u victoria -p <your_password>\n
Run Code Online (Sandbox Code Playgroud)\n\n** 实验 1**
\n\n参考:http://markhneedham.com/blog/2014/08/22/neo4j-load-csv-handling-empty-columns/
\n\n这个解决方案(Mark Needham 的)非常聪明:它创建包含所有非 NULL 属性的节点,例如
\n\n<id>: 0 abbreviation: GLC kegg_entry: C00267 name: \xce\xb1-D-glucose\n <id>: 10 kegg_entry: C00022 name: pyruvate
USING PERIODIC COMMIT\nLOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row\nMERGE (a:GlycolysisMetabolites {name: row.name})\nFOREACH(ignoreMe IN CASE WHEN row.abbreviation <> "" THEN [1] ELSE [] END | SET a.abbreviation = row.abbreviation)\nFOREACH(ignoreMe IN CASE WHEN row.kegg_entry <> "" THEN [1] ELSE [] END | SET a.kegg_entry = row.kegg_entry)\n// With "USING PERIODIC COMMIT",\n// RETURN a;\n// throws this error: "Unknown value type: STRUCT"\n// ... so, use this:\nRETURN a.name, a.abbreviation, a.kegg_entry;\n
Run Code Online (Sandbox Code Playgroud)\n\n输出:
\n\n$ cat glycolysis.cypher | cypher-shell -u victoria -p <your_password>\n\na.name, a.abbreviation, a.kegg_entry\n"\xce\xb1-D-glucose", "GLC", "C00267"\n"glucose 6-phosphate", "G6P", "C00668"\n"fructose 6-phosphate", "F6P", "C05345"\n"fructose 1,6-bisphosphate", "FBP", "C05378"\n"dihydroxyacetone phosphate", "DHAP", "C00111"\n"D-glyceraldehyde 3-phosphate", NULL, "C00118"\n"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"\n"3-phosphoglycerate", "3PG", "C00197"\n"2-phosphoglycerate", "2PG", "C00631"\n"phosphoenolpyruvate", "PEP", "C00074"\n"pyruvate", NULL, "C00022"\n
Run Code Online (Sandbox Code Playgroud)\n\n但是,您无法对包含 NULL 值(此处:“缩写”)的属性设置自己的 MERGE 规范 - 原因是您无法对 NULL 属性值进行 MERGE。
\n\n作品:
\n\nMERGE (a:GlycolysisMetabolites {name: row.name})\n
Run Code Online (Sandbox Code Playgroud)\n\n失败(“无法使用缩写的空属性值合并节点”):
\n\nMERGE (a:GlycolysisMetabolites {name: row.name, abbreviation:row.abbreviation})\nMERGE (a:GlycolysisMetabolites {name: row.name, abbreviation:row.abbreviation, kegg_entry:row.kegg_entry})\n
Run Code Online (Sandbox Code Playgroud)\n\n实验2
\n\n\n\n在这里,我设置一个空字符串 (\'\') 作为 CSV 文件中存在的 NULL 值的替换;你可以使用任何你想要的东西;例如:“未定义”、“空”、...
\n\nUSING PERIODIC COMMIT\nLOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row\n// MERGE (a:GlycolysisMetabolites {name: row.name})\nMERGE (a:GlycolysisMetabolites {name: row.name, abbreviation:COALESCE(row.abbreviation, \'\'), kegg_entry:COALESCE(row.kegg_entry, \'\')})\n// With "USING PERIODIC COMMIT",\n// RETURN a;\n// throws this error: "Unknown value type: STRUCT"\n// ... so, use this:\nRETURN a.name, a.abbreviation, a.kegg_entry;\n
Run Code Online (Sandbox Code Playgroud)\n\n输出:
\n\n$ cat glycolysis.cypher | cypher-shell -u victoria -p <your_password>\n\na.name, a.abbreviation, a.kegg_entry\n"\xce\xb1-D-glucose", "GLC", "C00267"\n"glucose 6-phosphate", "G6P", "C00668"\n"fructose 6-phosphate", "F6P", "C05345"\n"fructose 1,6-bisphosphate", "FBP", "C05378"\n"dihydroxyacetone phosphate", "DHAP", "C00111"\n"D-glyceraldehyde 3-phosphate", "", "C00118"\n"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"\n"3-phosphoglycerate", "3PG", "C00197"\n"2-phosphoglycerate", "2PG", "C00631"\n"phosphoenolpyruvate", "PEP", "C00074"\n"pyruvate", "", "C00022"\n
Run Code Online (Sandbox Code Playgroud)\n\n实验3
\n\n参考:
\n\n\n\nhttps://github.com/neo4j/neo4j/issues/2521
\n\n这也有效,但由于 COALESCE 语句位于 MERGE 语句之外,我担心如果这些变量名称在其他地方重用,则 RETURN 语句返回的数据可能会导致问题。作为解决方法,\n我添加了一个前缀 (a_) 作为准 UID,但我认为上面实验 2 中的解决方案是更好的方法。
\n\nUSING PERIODIC COMMIT\nLOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row\nWITH\n COALESCE(CASE row.name WHEN \'\' THEN null ELSE row.name END, \'\') AS a_name,\n COALESCE(CASE row.abbreviation WHEN \'\' THEN null ELSE row.abbreviation END, \'\') AS a_abbreviation,\n COALESCE(CASE row.kegg_entry WHEN \'\' THEN null ELSE row.kegg_entry END, \'\') AS a_kegg_entry\nMERGE (a:GlycolysisMetabolites {name:a_name, abbreviation:a_abbreviation, kegg_entry:a_kegg_entry})\n// Note: RETURN can only be used at the end of the query\nRETURN a_name, a_abbreviation, a_kegg_entry;\n
Run Code Online (Sandbox Code Playgroud)\n\n输出:
\n\n$ cat glycolysis.cypher | cypher-shell -u victoria -p <your_password>\n\na_name, a_abbreviation, a_kegg_entry\n"\xce\xb1-D-glucose", "GLC", "C00267"\n"glucose 6-phosphate", "G6P", "C00668"\n"fructose 6-phosphate", "F6P", "C05345"\n"fructose 1,6-bisphosphate", "FBP", "C05378"\n"dihydroxyacetone phosphate", "DHAP", "C00111"\n"D-glyceraldehyde 3-phosphate", "", "C00118"\n"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"\n"3-phosphoglycerate", "3PG", "C00197"\n"2-phosphoglycerate", "2PG", "C00631"\n"phosphoenolpyruvate", "PEP", "C00074"\n"pyruvate", "", "C00022"\n
Run Code Online (Sandbox Code Playgroud)\n\n有关此主题/问题的其他 StackOverflow 讨论:\n https://stackoverflow.com/search?tab=votes&q=Neo4j%20use%20MERGE%20with%20null%20value
\n\n附录
\n\n参考(例如):带有空单元格的 Neo4j CSV 文件加载
\n\n这“有效”,但如果任何字段包含 NULL 值,则会跳过节点的创建:
\n\nUSING PERIODIC COMMIT\nLOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/pg2neo4j/glycolysis_metabolites.csv" AS row\nFOREACH (\n x IN CASE WHEN row.abbreviation IS NULL OR row.kegg_entry IS NULL THEN [] ELSE [1] END |\n MERGE (a:GlycolysisMetabolites {name: row.name, abbreviation: row.abbreviation, kegg_entry: row.kegg_entry})\n )\nRETURN row.name, row.abbreviation, row.kegg_entry;\n
Run Code Online (Sandbox Code Playgroud)\n\n输出:
\n\n$ cat glycolysis.cypher | cypher-shell -u victoria -p <password>\n\nrow.name, row.abbreviation, row.kegg_entry\n"\xce\xb1-D-glucose", "GLC", "C00267"\n"glucose 6-phosphate", "G6P", "C00668"\n"fructose 6-phosphate", "F6P", "C05345"\n"fructose 1,6-bisphosphate", "FBP", "C05378"\n"dihydroxyacetone phosphate", "DHAP", "C00111"\n"D-glyceraldehyde 3-phosphate", NULL, "C00118"\n"1,3-bisphosphoglycerate", "1,3-BPG", "C00236"\n"3-phosphoglycerate", "3PG", "C00197"\n"2-phosphoglycerate", "2PG", "C00631"\n"phosphoenolpyruvate", "PEP", "C00074"\n"pyruvate", NULL, "C00022"\n
Run Code Online (Sandbox Code Playgroud)\n\n请注意,在 Neo4j 浏览器中,仅创建 9 个(不是 11 个)节点:不会创建“D-甘油醛 3-磷酸”和“丙酮酸”的节点。
\n 归档时间: |
|
查看次数: |
3220 次 |
最近记录: |