Shi*_*rma 3 python machine-learning k-means unsupervised-learning scikit-learn
我已经使用sklearn使用Kmeans完成了聚类。尽管它有一种打印质心的方法,但我发现scikit-learn没有找到簇长的方法(或者到目前为止我还没有看到它)真是太奇怪了。是否有一种巧妙的方法来获取每个群集的群集长度或与群集关联的许多点?我现在有这个相当笨拙的代码,在我发现长度为一的簇的情况下,并且需要通过测量点之间的欧几里得距离来向该簇添加其他点,并且必须更新标签
import numpy as np
from clustering.clusternew import Kmeans_clu
from evolution.generate import reproduction
from mapping.somnew import mapping, no_of_neurons, neuron_weights_init
from population_creation.population import pop_create
from New_SOL import newsol
data = genfromtxt('iris.csv', delimiter=',', skip_header=0, usecols=range(0, 4)) ##Read the input data
actual_label = genfromtxt('iris.csv', delimiter=',', dtype=str,skip_header=0, usecols=(4))
chromosome = int(input("Enter the number of chromosomes: ")) #Input the population size
max_gen = int(input("Enter the maximum number of generation: ")) #Input the maximum number of generation
for i in range(0, chromosome):
cluster = 3#random.randint(2, max_cluster) ##Randomly selects cluster number from 2 to root(poplation)
K.insert(i, cluster) ##Store the number of clusters in clu
print('value of K is ',K)
u, label,z1,A1= Kmeans_clu(cluster, data)
#print("centers and labels : ", u, label)
lab.insert(i, label) ##Store the labels in lab
center.insert(i, u)
new_center = pop_create(max_cluster, features, cluster, u)
population.insert(i, new_center)
print("VAlue of population in main\n" ,population)
newsol(max_gen,population,data)
Run Code Online (Sandbox Code Playgroud)
对于newsol方法,我们从上述方法生成的代码中传递新种群,然后再次对该种群进行K-Means
def ClusterIndicesComp(clustNum, labels_array): #list comprehension for accessing the features in iris data set
return np.array([i for i, x in enumerate(labels_array) if x == clustNum])
def newsol(max_gen,population,data):
#print('VAlue of NewSol Population is',population)
for i in range(max_gen):
cluster1=5
u,label,t,l=Kmeans_clu(cluster1, population)
A1.insert(i,t)
plab.insert(i,label)
pcenter.insert(i,u)
k2=Counter(l.labels_) #Count number of elements in each cluster
k1=[t for (t, v) in k2.items() if v == 1] #element whose length is one will be fetched
t1= np.array(k1) #Iterating through the cluster which have one point associated with them
for b in range(len(t1)):
print("Value in NEW_SOL is of 1 length cluster\n",t1[b])
plot1=data[ClusterIndicesComp(t1[b], l.labels_)]
print("Values are in sol of plot1",plot1)
for q in range(cluster1):
plot2=data[ClusterIndicesComp(q, l.labels_)]
print("VAlue of plot2 is for \n",q,plot2)
for i in range(len(plot2)):#To get one element at a time from plot2
plotk=plot2[i]
if([t for (t, v) in k2.items() if v >2]):#checking if the cluster have more than 2 points than only the distance will be calculated
S=np.linalg.norm(np.array(plot1) - np.array(plotk))
print("Distance between plot1 and plotk is",plot1,plotk,np.linalg.norm(np.array(plot1) - np.array(plotk)))#euclidian distance is calculated
else:
print("NO distance between them\n")
Run Code Online (Sandbox Code Playgroud)
我做的Kmeans是
from sklearn.cluster import KMeans
import numpy as np
def Kmeans_clu(K, data):
kmeans = KMeans(n_clusters=K, init='random', max_iter=1, n_init=1).fit(data) ##Apply k-means clustering
labels = kmeans.labels_
clu_centres = kmeans.cluster_centers_
z={i: np.where(kmeans.labels_ == i)[0] for i in range(kmeans.n_clusters)} #getting cluster for each label
return clu_centres, labels ,z,kmeans
Run Code Online (Sandbox Code Playgroud)
为了获得每个集群中的实例数量,您可以尝试使用Counter
:
from collections import Counter, defaultdict
print(Counter(estimator.labels_))
Run Code Online (Sandbox Code Playgroud)
结果:
Counter({0: 62, 1: 50, 2: 38})
Run Code Online (Sandbox Code Playgroud)
其中群集0具有62个实例,群集1具有50个实例,群集2具有38个实例
并且可能是存储每个集群实例的索引,您可以使用defaultdict
:
clusters_indices = defaultdict(list)
for index, c in enumerate(estimator.labels_):
clusters_indices[c].append(index)
Run Code Online (Sandbox Code Playgroud)
现在,要查找集群0中实例的索引,请调用:
print(clusters_indices[0])
Run Code Online (Sandbox Code Playgroud)
结果:
[50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114, 119, 121, 123, 126, 127, 133, 138, 142, 146, 149]
Run Code Online (Sandbox Code Playgroud)