DBSCAN算法可以创建少于minPts的集群吗?

tes*_*est 2 cluster-analysis machine-learning data-mining dbscan

我刚刚编写了DBSCAN算法,但我想知道DBSCAN算法是否可以允许集群中的点数少于所使用的minPts参数。

我一直在使用http://people.cs.nctu.edu.tw/~rsliang/dbscan/testdatagen.html来验证我的实现,但似乎运行得很好,只是遇到了这个问题。

我正在对样本数据集进行一些模拟,并且我一直使用的minPts为3。DBSCAN算法通常会从数据集中创建2点的簇(虽然从不1)。这是设计使然还是我搞砸了实现?

一些样本数据,eps = 0.1,minPts = 3。

 0.307951851891331 0.831249445598223
 0.0223402371734102 0.352948855307395
 0.780763753587736 0.691021379870838
 0.950537940464233 0.849805725668467
 0.66559538881555 0.603627873865714
 0.983049284658883 0.320016804300256
 0.710854941844407 0.646746252033276
 0.404260418566065 0.610378857986247
 0.740377815785062 0.899680181825385
 0.430522446721104 0.597713506593236
 0.0365937198682659 0.109160974206944
 0.378702778545536 0.115744969861463
 0.765229786171219 0.568206346858389
 0.760991609078362 0.59582572271853
 0.970256112036414 0.480310371834929
 0.110018607280226 0.541528500403058
 0.679553015939683 0.951676915377228
 0.730563320094051 0.806108465793593
 0.30542559935964 0.500680956757013
 0.740971321585109 0.670210885196091
 0.877572476806851 0.221948942738561
 0.882196086404005 0.674841667374057
 0.808923079077584 0.740714808339586
 0.935197343553974 0.438659039064617
 0.283511740287539 0.271373094185895
 0.0740317893559261 0.602333299630477
 0.30702819223843 0.0683579570932118
 0.31839294653311 0.198790877684388
 0.452546667052687 0.906595267311947
 0.587719069136176 0.212557406729347
 0.930029770792476 0.354712217745703
 0.879549613632052 0.185285016980621
 0.493609266585488 0.441520784255825
 0.640463788360573 0.759178026467179
 0.916182931939225 0.598151952772472
Run Code Online (Sandbox Code Playgroud)

输出集群:

(Cluster: 1 { 0.780763753587736,0.691021379870838 }, { 0.66559538881555,0.603627873865714 }, { 0.710854941844407,0.646746252033276 }, { 0.765229786171219,0.568206346858389 }, { 0.760991609078362,0.59582572271853 }, { 0.740971321585109,0.670210885196091 }, { 0.882196086404005,0.674841667374057 }, { 0.808923079077584,0.740714808339586 }, { 0.916182931939225,0.598151952772472 })
(Cluster: 2 { 0.983049284658883,0.320016804300256 }, { 0.970256112036414,0.480310371834929 }, { 0.935197343553974,0.438659039064617 }, { 0.930029770792476,0.354712217745703 })
(Cluster: 3 { 0.404260418566065,0.610378857986247 }, { 0.430522446721104,0.597713506593236 })
(Cluster: 4 { 0.740377815785062,0.899680181825385 }, { 0.679553015939683,0.951676915377228 }, { 0.730563320094051,0.806108465793593 })
(Cluster: 5 { 0.378702778545536,0.115744969861463 }, { 0.30702819223843,0.0683579570932118 })
(Cluster: 6 { 0.110018607280226,0.541528500403058 }, { 0.0740317893559261,0.602333299630477 })
(Cluster: 7 { 0.877572476806851,0.221948942738561 }, { 0.879549613632052,0.185285016980621 })
(Cluster: 8 { 0.283511740287539,0.271373094185895 }, { 0.31839294653311,0.198790877684388 })
Run Code Online (Sandbox Code Playgroud)

Ano*_*sse 5

是。

DBSCAN中的集群只能保证至少包含1个核心点

由于属于多个群集的边界点将被“随机”分配(通常是“先到先得”),因此一个核心点可能无法保留/获取其所有邻居。因此,它可能小于minPts。

一维示例:minPts = 4,epsilon = 1:

0.0 0.5 1.0     2.0     3.0 3.5 4.0
 |-------X-------|                   Core point at 1.0, Cluster 1.
                 |-------X-------|   Core point at 3.0, Cluster 2.
Run Code Online (Sandbox Code Playgroud)

所有其他点都不是核心点。由于点2.0 不是核心点,因此它不会连接两个群集。其中一个集群将有4个成员,其他集群只有3个。

参考实现最接近的事情可能是ELKI中的DBSCAN实现。也许您应该检查结果是否与ELKI的结果匹配。因为在您的数据集中,我相信您也确实有一些编程错误。