如何返回Neo4j的前n个最大集群？

Question

如何返回Neo4j的前n个最大集群？

Tri*_*oan 5 graph neo4j graph-databases cypher

在我的数据库中，图形看起来像这样：

我想在我的数据中找到排名前三的最大群集。集群是相互连接的节点的集合，连接的方向并不重要。从图片可以看出，预期结果应分别具有3个簇，大小分别为3 2 2。

到目前为止，这是我想到的：

MATCH (n)
RETURN n, size((n)-[*]-()) AS cluster_size
ORDER BY cluster_size  DESC
LIMIT 100

Run Code Online (Sandbox Code Playgroud)

但是，它有两个问题：

我认为查询是错误的，因为size（）函数不返回我想要的集群中的节点数，而是返回与模式匹配的子图数。
该LIMIT子句限制返回的节点数，而不是返回最高结果。这就是为什么我在那放100。

我现在应该怎么做？我被困住了:(谢谢您的帮助。

更新

感谢Bruno Peres的回答，我能够尝试Neo4j Graph Algorithm中的algo.unionFind查询。我可以使用以下查询找到连接的组件的大小：

CALL algo.unionFind.stream()
YIELD nodeId,setId
RETURN setId,count(*) as size_of_component
ORDER BY size_of_component DESC LIMIT 20;

Run Code Online (Sandbox Code Playgroud)

结果如下：

但这就是我所知道的。我无法获得有关每个组件中的节点的任何信息以使其可视化。在collect(nodeId)因为前2成分过大，永远需要。而且我知道可视化那些大型组件没有意义，但是第三个组件又如何呢？235个节点可以很好地渲染。

Answer 1

Bru*_*res 5

我认为您正在寻找Connected Componentes。Neo4j图算法用户指南中有关连接组件的部分介绍：

Connected Components或UnionFind基本上找到了连接节点的集合，其中每个节点都可以从同一集合中的任何其他节点到达。在图论中，无向图的连接部分是一个子图，其中任意两个顶点通过路径相互连接，并且不与图中的其他顶点连接。

在这种情况下，您可以安装Neo4j Graph Algorithms并使用algo.unionFind。我使用以下示例数据集重现了您的方案：

create (x), (y),
(a), (b), (c),
(d), (e),
(f), (g),
(a)-[:type]->(b), (b)-[:type]->(c), (c)-[:type]->(a),
(d)-[:type]->(e),
(f)-[:type]->(g)

Run Code Online (Sandbox Code Playgroud)

然后运行algo.unionFind：

// call unionFind procedure
CALL algo.unionFind.stream('', ':type', {})
YIELD nodeId,setId
// groupBy setId, storing all node ids of the same set id into a list
WITH setId, collect(nodeId) as nodes
// order by the size of nodes list descending
ORDER BY size(nodes) DESC
LIMIT 3 // limiting to 3
RETURN setId, nodes

Run Code Online (Sandbox Code Playgroud)

结果将是：

????????????????????
?"setId"?"nodes"   ?
????????????????????
?2      ?[11,12,13]?
????????????????????
?5      ?[14,15]   ?
????????????????????
?7      ?[16,17]   ?
????????????????????

Run Code Online (Sandbox Code Playgroud)

编辑

来自评论：

如何获取特定setId的所有nodeId？例如，从上面的屏幕截图中，如何获取setId 17506的所有nodeId？该setId有235个节点，我想对其可视化。

CALL algo.unionFind('', ':type', {write:true, partitionProperty:"partition"}) YIELD nodes RETURN *. This statement will create a为每个节点运行调用分区的属性，其中包含该节点所属的分区ID。
运行以下语句以获取前3个分区：match (node) with node.partition as partition, count(node) as ct order by ct desc limit 3 return partition, ct。
现在，您可以match (node {partition : 17506}) return node使用第二个查询中返回的分区ID 单独获取每个前3个分区的所有节点。

归档时间：	7 年，10 月前
查看次数：	595 次
最近记录：	7 年，9 月前