HDBSCAN Python 选择簇数

Question

HDBSCAN Python 选择簇数

use*_*823 6 python hierarchical-clustering

是否可以在python中选择HDBSCAN算法中的簇数？或者唯一的方法是使用输入参数，例如 alpha、min_cluster_size？

谢谢

更新： 这是使用 fcluster 和 hdbscan 的代码

import hdbscan
from scipy.cluster.hierarchy import fcluster

clusterer = hdbscan.HDBSCAN()
clusterer.fit(X)
Z = clusterer.single_linkage_tree_.to_numpy()
labels = fcluster(Z, 2, criterion='maxclust')

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mar*_*Mar 5

值得庆幸的是，2020 年 6 月，GitHub 上的贡献者（平面集群模块）提供了一个提交，该提交向 hdbscan 添加了代码，使我们能够选择生成的集群的数量。

为此：

from hdbscan import flat

clusterer = flat.HDBSCAN_flat(train_df, n_clusters, prediction_data=True)
flat.approximate_predict_flat(clusterer, points_to_predict, n_clusters)

Run Code Online (Sandbox Code Playgroud)

您可以在此处找到代码flat.py您应该能够使用 approximation_predict_flat 选择测试点的簇数。

另外，还写了一个jupyter笔记本解释如何使用，这里。

归档时间：	8 年，4 月前
查看次数：	1922 次
最近记录：	4 年，9 月前