我正在使用这个GSDMM python 实现来聚类文本消息的数据集。根据初始论文, GSDMM 收敛速度快(大约 5 次迭代)。我也有收敛到一定数量的集群,但是每次迭代仍然有很多消息传递,所以很多消息仍然在改变它们的集群。
我的输出看起来像:
In stage 0: transferred 9511 clusters with 150 clusters populated
In stage 1: transferred 4974 clusters with 138 clusters populated
In stage 2: transferred 2533 clusters with 90 clusters populated
….
In stage 34: transferred 1403 clusters with 47 clusters populated
In stage 35: transferred 1410 clusters with 47 clusters populated
In stage 36: transferred 1430 clusters with 48 clusters populated
In stage 37: transferred 1463 clusters with 48 …Run Code Online (Sandbox Code Playgroud)