了解此Python代码的详细信息

idp*_*d15 -2 python scikit-learn

任务是从sklearn加载虹膜数据集,然后制作一些图.我希望了解每个命令在做什么.

来自sklearn.datasets import load_iris

Q1 load_ir是sklearn中的一个函数吗?

data = load_iris()

Q2 现在我相信这个load_iris函数正在返回一些我们作为数据存储的输出.load_iris()的输出究竟是什么?类型等?

df = pd.DataFrame(data.data,columns = data.feature_names)

Q3现在我们将其存储为数据帧.但是什么是data.data和data.feature_names

df ['target_names'] = [data.target_names [i] for data in data.target]

Q4我不理解上面代码的右侧
需要帮助问题1,2,3和4.我试着查看Scikit文档,但是不理解它.此代码也来自edx的在线课程,但他们没有解释代码.

Max*_*axU 6

发现Jupyter/iPython的插入能力.

我在这个例子中使用的是iPython.

Q1 load_ir是sklearn中的一个函数吗?

In [33]: type(load_iris)
Out[33]: function
Run Code Online (Sandbox Code Playgroud)

Q2现在我相信这个load_iris函数正在返回一些我们作为数据存储的输出.load_iris()的输出究竟是什么?类型等?

Docstring - 非常有帮助:

In [34]: load_iris?
Signature: load_iris(return_X_y=False)
Docstring:
Load and return the iris dataset (classification).

The iris dataset is a classic and very easy multi-class classification
dataset.

=================   ==============
Classes                          3
Samples per class               50
Samples total                  150
Dimensionality                   4
Features            real, positive
=================   ==============

Read more in the :ref:`User Guide <datasets>`.

Parameters
----------
return_X_y : boolean, default=False.
    If True, returns ``(data, target)`` instead of a Bunch object. See
    below for more information about the `data` and `target` object.

    .. versionadded:: 0.18

Returns
-------
data : Bunch
    Dictionary-like object, the interesting attributes are:
    'data', the data to learn, 'target', the classification labels,
    'target_names', the meaning of the labels, 'feature_names', the
    meaning of the features, and 'DESCR', the
    full description of the dataset.

(data, target) : tuple if ``return_X_y`` is True
...
Run Code Online (Sandbox Code Playgroud)

打印说明:

In [51]: print(data.DESCR)
Iris Plants Database
====================

Notes
-----
Data Set Characteristics:
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
...
Run Code Online (Sandbox Code Playgroud)

Q3现在我们将其存储为数据帧.但是什么是data.data和data.feature_names

In [37]: type(data.data)
Out[37]: numpy.ndarray

In [88]: data.data.shape
Out[88]: (150, 4)

In [38]: df = pd.DataFrame(data.data, columns=data.feature_names)

In [39]: pd.set_option('display.max_rows', 10)

In [40]: df
Out[40]:
     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                  5.1               3.5                1.4               0.2
1                  4.9               3.0                1.4               0.2
2                  4.7               3.2                1.3               0.2
3                  4.6               3.1                1.5               0.2
4                  5.0               3.6                1.4               0.2
..                 ...               ...                ...               ...
145                6.7               3.0                5.2               2.3
146                6.3               2.5                5.0               1.9
147                6.5               3.0                5.2               2.0
148                6.2               3.4                5.4               2.3
149                5.9               3.0                5.1               1.8

[150 rows x 4 columns]

In [41]: df.columns
Out[41]: Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], dtype='object')

In [42]: data.feature_names
Out[42]:
['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']
Run Code Online (Sandbox Code Playgroud)

Q4我不理解上面代码的右侧需要帮助问题1,2,3和4.我试着查看Scikit文档,但是不理解它.此代码也来自edx的在线课程,但他们没有解释代码.

执行代码并检查结果 - 通常很容易看到发生了什么.顺便说一下,我会在这一步使用Numpy:

In [49]: df['target_names'] = np.take(data.target_names, data.target)

In [50]: df
Out[50]:
     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm) target_names
0                  5.1               3.5                1.4               0.2       setosa
1                  4.9               3.0                1.4               0.2       setosa
2                  4.7               3.2                1.3               0.2       setosa
3                  4.6               3.1                1.5               0.2       setosa
4                  5.0               3.6                1.4               0.2       setosa
..                 ...               ...                ...               ...          ...
145                6.7               3.0                5.2               2.3    virginica
146                6.3               2.5                5.0               1.9    virginica
147                6.5               3.0                5.2               2.0    virginica
148                6.2               3.4                5.4               2.3    virginica
149                5.9               3.0                5.1               1.8    virginica

[150 rows x 5 columns]
Run Code Online (Sandbox Code Playgroud)