Fir*_*ger 6 indexing alias neo4j cypher
我有一个合理数量的节点(大约60,000)
(:Document {title:"A title"})
Run Code Online (Sandbox Code Playgroud)
给定一个标题,我想找到匹配的节点,如果存在的话.问题是我给出的标题不一致.也就是说,有时新单词的开头是Capital,有时候它都是小写的.有时Key-Words与Kebab案例相结合,有时它们通常被写成关键词.
为了弥补这一点,我使用了apoc和给定标题与每个节点之间的Levenshtein距离,并且如果它低于某个阈值,则只接受一个节点作为匹配:
MATCH (a:Document)
WHERE apoc.text.distance(a.title, "A title") < 10
RETURN a
Run Code Online (Sandbox Code Playgroud)
这不能很好地扩展.目前单个查找需要大约700毫秒,这太慢了,因为这可能会增长到大约150,000个节点.
我在考虑alias:[...]在节点的属性中存储/缓存替代标题的出现并在所有别名上构建索引,但我不知道在Neo4j中这是否可行.
在给定大型节点数据库的情况下,"模糊查找"标题的最快方法是什么?
Chr*_*sen 17
在Neo4j 3.5(目前在beta03上),有FTS(全文搜索)功能.
编辑:我在Neo4j上写了一篇关于FTS的详细博客文章:https://graphaware.com/neo4j/2019/01/11/neo4j-full-text-search-deep-dive.html
您可以使用Lucene Classic Query Parser语法查询您的文档.
创建索引:
CALL db.index.fulltext.createNodeIndex('documents', ['Document'], ['title','text'])
Run Code Online (Sandbox Code Playgroud)
导入一些文件:
LOAD CSV WITH HEADERS FROM "file:///docs.csv" AS row
CREATE (n:Document) SET n = row
Run Code Online (Sandbox Code Playgroud)
查询标题包含"重收费"的文档
CALL db.index.fulltext.queryNodes('documents', 'title: "heavy toll"')
YIELD node, score
RETURN node.title, score
???????????????????????????????????????????????????????????????????????????????????????????
?"node.title" ?"score" ?
???????????????????????????????????????????????????????????????????????????????????????????
?"Among Deaths in 2016, a Heavy Toll in Pop Music - The New York Times"?3.7325966358184814?
???????????????????????????????????????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)
使用拼写错误查询相同的标题:
CALL db.index.fulltext.queryNodes('documents', 'title: \\"heavy~ tall~\\"')
YIELD node, score
RETURN node.title, score
Run Code Online (Sandbox Code Playgroud)
注意转义quotes => \",传递给底层解析器的字符串应该包含引号,以便执行短语查询而不是布尔查询.
此外,tidle术语旁边还表示使用Damarau-Levenshtein算法进行模糊搜索.
??????????????????????????????????????????????????????????????????????????????????????????????
?"node.title" ?"score" ?
??????????????????????????????????????????????????????????????????????????????????????????????
?"Among Deaths in 2016, a Heavy Toll in Pop Music - The New York Times"?0.868073046207428 ?
??????????????????????????????????????????????????????????????????????????????????????????????
?"Prisons Run by C.E.O.s? Privatization Under Trump Could Carry a Heavy?0.4014900326728821 ?
? Price - The New York Times" ? ?
??????????????????????????????????????????????????????????????????????????????????????????????
?"‘All Talk,’ ‘No Action,’ Says Trump in Twitter Attack on Civil Rights?0.28181418776512146 ?
? Icon - The New York Times" ? ?
??????????????????????????????????????????????????????????????????????????????????????????????
?"Immigrants Head to Washington to Rally While Obama Is Still There - T?0.24634429812431335 ?
?he New York Times" ? ?
??????????????????????????????????????????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1755 次 |
| 最近记录: |