我有一个数据集,2M行×7列,具有不同的家庭功耗测量值,每个测量的日期.
我将我的数据集放入pandas数据框中,选择除日期列之外的所有列,然后执行交叉验证拆分.
import pandas as pd
from sklearn.cross_validation import train_test_split
data = pd.read_csv('household_power_consumption.txt', delimiter=';')
power_consumption = data.iloc[0:, 2:9].dropna()
pc_toarray = power_consumption.values
hpc_fit, hpc_fit1 = train_test_split(pc_toarray, train_size=.01)
power_consumption.head()
Run Code Online (Sandbox Code Playgroud)

我使用K-means分类,然后显示PCA降维.
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import PCA
hpc = PCA(n_components=2).fit_transform(hpc_fit)
k_means = KMeans()
k_means.fit(hpc)
x_min, x_max = hpc[:, 0].min() - 5, hpc[:, 0].max() - 1
y_min, y_max = hpc[:, 1].min(), hpc[:, 1].max() + 5 …Run Code Online (Sandbox Code Playgroud)