如何获得`skbio` PCoA(主要坐标分析)结果?

O.r*_*rka 3 machine-learning linear-algebra dimensionality-reduction scikits skbio

我看attributesskbio's PCoA方法(见下表).我是新来这个API,我希望能够得到eigenvectors投射到新中轴线和原始点相似.fit_transformsklearn.decomposition.PCA,所以我可以创造一些PC_1 vs PC_2式的情节.我想出了如何获得eigvals,proportion_explainedfeatures回来了None.

这是因为它处于测试阶段吗?

如果有任何教程使用它,那将非常感激.我是一个狂热的粉丝,scikit-learn并希望开始使用更多的scikit's产品.

|  Attributes
 |  ----------
 |  short_method_name : str
 |      Abbreviated ordination method name.
 |  long_method_name : str
 |      Ordination method name.
 |  eigvals : pd.Series
 |      The resulting eigenvalues.  The index corresponds to the ordination
 |      axis labels
 |  samples : pd.DataFrame
 |      The position of the samples in the ordination space, row-indexed by the
 |      sample id.
 |  features : pd.DataFrame
 |      The position of the features in the ordination space, row-indexed by
 |      the feature id.
 |  biplot_scores : pd.DataFrame
 |      Correlation coefficients of the samples with respect to the features.
 |  sample_constraints : pd.DataFrame
 |      Site constraints (linear combinations of constraining variables):
 |      coordinates of the sites in the space of the explanatory variables X.
 |      These are the fitted site scores
 |  proportion_explained : pd.Series
 |      Proportion explained by each of the dimensions in the ordination space.
 |      The index corresponds to the ordination axis labels
Run Code Online (Sandbox Code Playgroud)

这是我生成principal component analysis对象的代码.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})
import skbio
from scipy.spatial import distance

%matplotlib inline
np.random.seed(0)

# Iris dataset
DF_data = pd.DataFrame(load_iris().data, 
                       index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
                       columns = load_iris().feature_names)
n,m = DF_data.shape
# print(n,m)
# 150 4

Se_targets = pd.Series(load_iris().target, 
                       index = ["iris_%d" % i for i in range(load_iris().data.shape[0])], 
                       name = "Species")

# Scaling mean = 0, var = 1
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data), 
                           index = DF_data.index,
                           columns = DF_data.columns)

# Distance Matrix
Ar_dist = distance.squareform(distance.pdist(DF_standard.T, metric="braycurtis")) # (m x m) distance measure
DM_dist = skbio.stats.distance.DistanceMatrix(Ar_dist, ids=DF_standard.columns)
PCoA = skbio.stats.ordination.pcoa(DM_dist)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

jai*_*out 5

您可以使用访问转换的样本坐标OrdinationResults.samples.这将返回pandas.DataFrame按样本ID索引的行(即距离矩阵中的ID).由于主坐标分析对样本的距离矩阵进行操作,因此变换的特征坐标(OrdinationResults.features)不可用.scikit-bio接受样本x特征表作为输入的其他排序方法将具有可用的变换特征坐标(例如,CA,CCA,RDA).

旁注:distance.squareform调用是不必要的,因为skbio.DistanceMatrix支持方形或矢量形式的数组.

  • 是的。您不需要直接构造一个 `skbio.OrdinationResults` 对象,它只保存排序方法的结果。scikit-bio 中的每个排序方法都会为您创建这个结果对象,您可以从中访问结果。使用 [`skbio.stats.ordination.pcoa`](http://scikit-bio.org/docs/latest/generated/generated/skbio.stats.ordination.pcoa.html) 函数在 `skbio 上运行 PCoA .DistanceMatrix` 对象。您将收到一个 `skbio.OrdinationResults` 对象,您可以在其上调用 `.samples` 以检索转换后的样本坐标。 (2认同)