Jas*_*son 2 pandas scikit-learn sklearn-pandas
我正在关注在 github上的sklearn_pandas README 中找到的sklearn_pandas 演练,并尝试修改 DateEncoder() 自定义转换器示例以执行另外两件事:
这是我的尝试(对 sklearn 管道有相当初步的了解):
import pandas as pd
import numpy as np
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn_pandas import DataFrameMapper
class DateEncoder(TransformerMixin):
'''
Specify date format using python strftime formats
'''
def __init__(self, date_format='%Y-%m-%d'):
self.date_format = date_format
def fit(self, X, y=None):
self.dt = pd.to_datetime(X, format=self.date_format)
return self
def transform(self, X):
dt = X.dt
return pd.concat([dt.year, dt.month, dt.day], axis=1)
data = pd.DataFrame({'dates1': ['2001-12-20','2002-10-21','2003-08-22','2004-08-23',
'2004-07-20','2007-12-21','2006-12-22','2003-04-23'],
'dates2' : ['2012-12-20','2009-10-21','2016-08-22','2017-08-23',
'2014-07-20','2011-12-21','2014-12-22','2015-04-23']})
DATE_COLS = ['dates1', 'dates2']
Mapper = DataFrameMapper([(i, DateEncoder(date_format='%Y-%m-%d')) for i in DATE_COLS], input_df=True, df_out=True)
test = Mapper.fit_transform(data)
Run Code Online (Sandbox Code Playgroud)
但是在运行时,我收到以下错误:
AttributeError: Can only use .dt accessor with datetimelike values
Run Code Online (Sandbox Code Playgroud)
为什么我会收到此错误以及如何修复它?此外,对于使用原始列(Date1_year、Date1_month、Date_1 天)重命名上述列名的任何帮助也将不胜感激!
我知道这已经晚了,但是如果您仍然对使用自定义转换器重命名列时执行此操作的方法感兴趣...
我使用了将方法添加get_feature_names到管道内的自定义转换器的方法ColumnTransformer(概述)。然后,您可以使用该.named_steps属性访问管道的步骤,然后到达get_feature_names并获取column_names,最终保存要使用的自定义列名称的名称。通过这种方式,您可以检索类似于此 SO 帖子中的方法的列名。
我不得不使用管道来运行它,因为当我尝试将它作为独立的自定义转换器进行时,它出错了(所以我不会在这里发布不完整的尝试)-尽管您可能有更好的运气。
这是显示管道的原始代码
import pandas as pd
from sklearn.base import TransformerMixin
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
data2 = pd.DataFrame(
{"dates1": ["2001-12-20", "2002-10-21", "2003-08-22", "2004-08-23",
"2004-07-20", "2007-12-21", "2006-12-22", "2003-04-23"
], "dates2": ["2012-12-20", "2009-10-21", "2016-08-22", "2017-08-23",
"2014-07-20", "2011-12-21", "2014-12-22", "2015-04-23"]})
DATE_COLS = ['dates1', 'dates2']
pipeline = Pipeline([
('transform', ColumnTransformer([
('datetimes', Pipeline([
('formatter', DateFormatter()), ('encoder', DateEncoder()),
]), DATE_COLS),
])),
])
data3 = pd.DataFrame(pipeline.fit_transform(data2))
data3_names = (
pipeline.named_steps['transform']
.named_transformers_['datetimes']
.named_steps['encoder']
.get_feature_names()
)
data3.columns = data3_names
print(data2)
print(data3)
Run Code Online (Sandbox Code Playgroud)
输出是
dates1 dates2
0 2001-12-20 2012-12-20
1 2002-10-21 2009-10-21
2 2003-08-22 2016-08-22
3 2004-08-23 2017-08-23
4 2004-07-20 2014-07-20
5 2007-12-21 2011-12-21
6 2006-12-22 2014-12-22
7 2003-04-23 2015-04-23
dates1_year dates1_month dates1_day dates2_year dates2_month dates2_day
0 2001 12 20 2012 12 20
1 2002 10 21 2009 10 21
2 2003 8 22 2016 8 22
3 2004 8 23 2017 8 23
4 2004 7 20 2014 7 20
5 2007 12 21 2011 12 21
6 2006 12 22 2014 12 22
7 2003 4 23 2015 4 23
Run Code Online (Sandbox Code Playgroud)
自定义变压器在这里(跳过DateFormatter,因为它与您的相同)
class DateEncoder(TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
dfs = []
self.column_names = []
for column in X:
dt = X[column].dt
# Assign custom column names
newcolumnnames = [column+'_'+col for col in ['year', 'month', 'day']]
df_dt = pd.concat([dt.year, dt.month, dt.day], axis=1)
# Append DF to list to assemble list of DFs
dfs.append(df_dt)
# Append single DF's column names to blank list
self.column_names.append(newcolumnnames)
# Horizontally concatenate list of DFs
dfs_dt = pd.concat(dfs, axis=1)
return dfs_dt
def get_feature_names(self):
# Flatten list of column names
self.column_names = [c for sublist in self.column_names for c in sublist]
return self.column_names
Run Code Online (Sandbox Code Playgroud)
理由 DateEncoder
pandas 列上的循环允许从每个日期时间列中提取日期时间属性。在同一个循环中,构建自定义列名称。然后将它们添加到self.column_names方法中返回的空白列表中get_feature_names(尽管在分配给数据帧之前必须将其展平)。
对于这种特殊情况,您可能会跳过sklearn_pandas.
细节
sklearn = 0.20.0
pandas = 0.23.4
numpy = 1.15.2
python = 2.7.15rc1
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2665 次 |
| 最近记录: |