从CSV文件导入Neo4j中的目录/文件结构的好方法是什么？

Question

从CSV文件导入Neo4j中的目录/文件结构的好方法是什么？

Rem*_*len 6 csv import directory-structure neo4j cypher

我希望使用Neo4j将很多文件名导入图形数据库。数据来自外部来源，并以CSV文件格式提供。我想根据数据创建树形结构，以便以后可以在查询中轻松地“导航”该结构（即查找某个目录下的所有文件，出现在多个目录中的所有文件等）。

因此，给出示例输入：

/foo/bar/example.txt
/bar/baz/another.csv
/example.txt
/foo/bar/onemore.txt

Run Code Online (Sandbox Code Playgroud)

我想要创建以下图形：

( / ) <-[:in]- ( foo ) <-[:in]- ( bar ) <-[:in]- ( example.txt )
                                        <-[:in]- ( onemore.txt )
      <-[:in]- ( bar ) <-[:in]- ( baz ) <-[:in]- ( another.csv )
      <-[:in]- ( example.txt )

Run Code Online (Sandbox Code Playgroud)

（其中每个节点标签实际上是一个属性，例如path :）。

使用固定数量的目录级别时，我已经能够达到预期的效果；例如，当每个文件位于三个级别的深度时，我可以创建一个包含4列的CSV文件：

dir_a,dir_b,dir_c,file
foo,bar,baz,example.txt
foo,bar,ban,example.csv
foo,bar,baz,another.txt

Run Code Online (Sandbox Code Playgroud)

并使用密码查询将其导入：

LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS row
  MERGE (dir_a:Path {name: row.dir_a})
  MERGE (dir_b:Path {name: row.dir_b}) <-[:in]- (dir_a)
  MERGE (dir_c:Path {name: row.dir_c}) <-[:in]- (dir_b)
  MERGE      (:Path {name: row.file})  <-[:in]- (dir_c)

Run Code Online (Sandbox Code Playgroud)

但是我想有一个适用于任何级别的子目录（以及一个数据集中的级别组合）的通用解决方案。请注意，如有必要，我可以对输入进行预处理，因此可以在输入CSV文件中创建任何需要的结构。

我看过要点或插件，但似乎找不到任何有效的方法。我认为/希望我应该能够使用split（）函数执行某些操作，即使用split（'/'，row.path）获取路径元素列表，但是我不知道如何处理该列表一连串的合并业务。

Answer 1

Dav*_*ett 1

这是对更普遍的东西的初步切入。

前提是您可以将完全限定路径拆分为多个组件，然后使用它的每个组件来拆分路径，以便您可以为较大路径的每个单独组件构造完全限定路径。使用它作为合并项目的键并在合并后设置单个组件。如果某个组件不是根级别，则找到单个组件的父组件并创建与其的关系。如果完全限定路径中存在重复的组件名称，这将会失败。

首先，我首先创建一个唯一性约束fq_path

create constraint on (c:Component) assert c.fq_path is unique;

Run Code Online (Sandbox Code Playgroud)

这是加载语句。

load csv from 'file:///path.csv' as line
with line[0] as line, split(line[0],'/') as path_components
unwind range(0, size(path_components)-1) as idx
with case 
       when idx = 0 then '/'
     else
       path_components[idx]
     end as component
   , case 
       when idx = 0 then '/'
     else
       split(line, path_components[idx])[0] + path_components[idx]
     end as fq_path
   , case 
       when idx = 0 then
         null
       when idx = 1 then
         '/'
     else
       substring(split(line, path_components[idx])[0],0,size(split(line, path_components[idx])[0])-1)
     end as parent
   , case 
       when idx = 0 then
         []
       else
         [1]
     end as find_parent
merge (new_comp:Component {fq_path: fq_path})
set new_comp.name = component
foreach ( y in find_parent |
  merge (theparent:Component {fq_path: parent} )
  merge (theparent)<-[:IN]-(new_comp)
)     
return *

Run Code Online (Sandbox Code Playgroud)

如果您想区分文件和文件夹，这里有一些查询，您可以稍后运行以在相应节点上设置另一个标签。

找到文件并将其设置为File

// find the last Components in a tree (no inbound IN)
// and set them as Files
match (c:Component)
where not (c)<-[:IN]-(:Component)
set c:File
return c

Run Code Online (Sandbox Code Playgroud)

找到文件夹并将其设置为Folder

// find all Components with an inbound IN
// and set them as Folders
match (c:Component)
where  (c)<-[:IN]-(:Component)
set c:Folder
return c

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，7 月前
查看次数：	609 次
最近记录：	9 年，7 月前