Pandas groupby 结合 sklean 预处理续

Question

Pandas groupby 结合 sklean 预处理续

从这篇文章继续： Pandas groupby 与 sklearn 预处理相结合

我需要通过按两列缩放分组数据来进行预处理，不知怎的，第二种方法会出现一些错误

import pandas as pd
import numpy as np
from sklearn.preprocessing import robust_scale,minmax_scale

df = pd.DataFrame( dict( id=list('AAAAABBBBB'),
                loc = (10,20,10,20,10,20,10,20,10,20),
                value=(0,10,10,20,100,100,200,30,40,100)))

df['new'] = df.groupby(['id','loc']).value.transform(lambda x:minmax_scale(x.astype(float) ))

df['new'] = df.groupby(['id','loc']).value.transform(lambda x:robust_scale(x ))

Run Code Online (Sandbox Code Playgroud)

第二个给我这样的错误：

ValueError：需要 2D 数组，却得到 1D 数组：array=[ 0. 10. 100.]。如果数据具有单个特征，则使用 array.reshape(-1, 1) 重塑数据；如果数据包含单个样本，则使用 array.reshape(1, -1) 重塑数据。

如果我使用重塑我会得到这样的错误：

例外：数据必须是一维的

如果我打印出分组数据，g['value']那就是 pandas 系列。

for n, g in df.groupby(['id','loc']):
    print(type(g['value']))

Run Code Online (Sandbox Code Playgroud)

你知道是什么原因造成的吗？

谢谢。

Answer 1

WeN*_*Ben 3

根据警告代码，您应该添加reshape和concatenate

df.groupby(['id','loc']).value.transform(lambda x:np.concatenate(robust_scale(x.values.reshape(-1,1))))
Out[606]: 
0   -0.2
1   -1.0
2    0.0
3    1.0
4    1.8
5    0.0
6    1.0
7   -2.0
8   -1.0
9    0.0
Name: value, dtype: float64

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，4 月前
查看次数：	529 次
最近记录：	7 年，4 月前