O.r*_*rka 3 machine-learning linear-algebra dimensionality-reduction scikits skbio
我看attributes的skbio's PCoA方法(见下表).我是新来这个API,我希望能够得到eigenvectors投射到新中轴线和原始点相似.fit_transform的sklearn.decomposition.PCA,所以我可以创造一些PC_1 vs PC_2式的情节.我想出了如何获得eigvals,proportion_explained但features回来了None.
这是因为它处于测试阶段吗?
如果有任何教程使用它,那将非常感激.我是一个狂热的粉丝,scikit-learn并希望开始使用更多的scikit's产品.
| Attributes
| ----------
| short_method_name : str
| Abbreviated ordination method name.
| long_method_name : str
| Ordination method name.
| eigvals : pd.Series
| The resulting eigenvalues. The index corresponds to the ordination
| axis labels
| samples : pd.DataFrame
| The position of the samples in the ordination space, row-indexed by the
| sample id.
| features : pd.DataFrame
| The position of the features in the ordination space, row-indexed by
| the feature id.
| biplot_scores : pd.DataFrame
| Correlation coefficients of the samples with respect to the features.
| sample_constraints : pd.DataFrame
| Site constraints (linear combinations of constraining variables):
| coordinates of the sites in the space of the explanatory variables X.
| These are the fitted site scores
| proportion_explained : pd.Series
| Proportion explained by each of the dimensions in the ordination space.
| The index corresponds to the ordination axis labels
Run Code Online (Sandbox Code Playgroud)
这是我生成principal component analysis对象的代码.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})
import skbio
from scipy.spatial import distance
%matplotlib inline
np.random.seed(0)
# Iris dataset
DF_data = pd.DataFrame(load_iris().data,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
columns = load_iris().feature_names)
n,m = DF_data.shape
# print(n,m)
# 150 4
Se_targets = pd.Series(load_iris().target,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
name = "Species")
# Scaling mean = 0, var = 1
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data),
index = DF_data.index,
columns = DF_data.columns)
# Distance Matrix
Ar_dist = distance.squareform(distance.pdist(DF_standard.T, metric="braycurtis")) # (m x m) distance measure
DM_dist = skbio.stats.distance.DistanceMatrix(Ar_dist, ids=DF_standard.columns)
PCoA = skbio.stats.ordination.pcoa(DM_dist)
Run Code Online (Sandbox Code Playgroud)
您可以使用访问转换的样本坐标OrdinationResults.samples.这将返回pandas.DataFrame按样本ID索引的行(即距离矩阵中的ID).由于主坐标分析对样本的距离矩阵进行操作,因此变换的特征坐标(OrdinationResults.features)不可用.scikit-bio接受样本x特征表作为输入的其他排序方法将具有可用的变换特征坐标(例如,CA,CCA,RDA).
旁注:distance.squareform调用是不必要的,因为skbio.DistanceMatrix支持方形或矢量形式的数组.