Sklearn预处理 - PolynomialFeatures - 如何保留输出数组/数据帧的列名称/标题

Aff*_*tus 12 python validation python-2.7 scikit-learn cross-validation

TLDR:如何从sklearn.preprocessing.PolynomialFeatures()函数获取输出numpy数组的头文件?


假设我有以下代码......

import pandas as pd
import numpy as np
from sklearn import preprocessing as pp

a = np.ones(3)
b = np.ones(3) * 2
c = np.ones(3) * 3

input_df = pd.DataFrame([a,b,c])
input_df = input_df.T
input_df.columns=['a', 'b', 'c']

input_df

    a   b   c
0   1   2   3
1   1   2   3
2   1   2   3

poly = pp.PolynomialFeatures(2)
output_nparray = poly.fit_transform(input_df)
print output_nparray

[[ 1.  1.  2.  3.  1.  2.  3.  4.  6.  9.]
 [ 1.  1.  2.  3.  1.  2.  3.  4.  6.  9.]
 [ 1.  1.  2.  3.  1.  2.  3.  4.  6.  9.]]
Run Code Online (Sandbox Code Playgroud)

如何让3x10矩阵/ output_nparray继承a,b,c标签它们与上述数据的关系?

Gui*_*sch 16

工作示例,所有在一行(我认为"可读性"不是这里的目标):

target_feature_names = ['x'.join(['{}^{}'.format(pair[0],pair[1]) for pair in tuple if pair[1]!=0]) for tuple in [zip(input_df.columns,p) for p in poly.powers_]]
output_df = pd.DataFrame(output_nparray, columns = target_feature_names)
Run Code Online (Sandbox Code Playgroud)

更新:正如@OmerB所指出的,现在你可以使用这个get_feature_names 方法:

>> poly.get_feature_names(input_df.columns)
['1', 'a', 'b', 'c', 'a^2', 'a b', 'a c', 'b^2', 'b c', 'c^2']
Run Code Online (Sandbox Code Playgroud)


Ome*_*erB 11

scikit-learn 0.18添加了一个漂亮的get_feature_names()方法!

>> input_df.columns
Index(['a', 'b', 'c'], dtype='object')

>> poly.fit_transform(input_df)
array([[ 1.,  1.,  2.,  3.,  1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  1.,  2.,  3.,  1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  1.,  2.,  3.,  1.,  2.,  3.,  4.,  6.,  9.]])

>> poly.get_feature_names(input_df.columns)
['1', 'a', 'b', 'c', 'a^2', 'a b', 'a c', 'b^2', 'b c', 'c^2']
Run Code Online (Sandbox Code Playgroud)

请注意,您必须为其提供列名称,因为sklearn本身不会从DataFrame中读取它.