在apply函数pandas python中包含组名

Question

在apply函数pandas python中包含组名

是否需要指定groupby调用以在apply lambda函数中使用组名.

例如,如果我遍历组,我可以通过以下元组分解获取组密钥:

for group_name, subdf in temp_dataframe.groupby(level=0, axis=0):
    print group_name

Run Code Online (Sandbox Code Playgroud)

是否需要在apply函数中获取组名,例如:

temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf)

Run Code Online (Sandbox Code Playgroud)

如何将组名作为apply lambda函数的参数？

谢谢!

Answer 1

EdC*_*ica 24

我认为你应该能够使用这个name属性:

temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x))

Run Code Online (Sandbox Code Playgroud)

应该工作,例如:

In [132]:
df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)})
df

Out[132]:
   a  b
0  a  0
1  a  1
2  b  2
3  c  3
4  c  4
5  c  5

In [134]:
df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x))

name: a 
subdf:    a  b
0  a  0
1  a  1
name: b 
subdf:    a  b
2  b  2
name: c 
subdf:    a  b
3  c  3
4  c  4
5  c  5
Out[134]:
Empty DataFrame
Columns: []
Index: []

Run Code Online (Sandbox Code Playgroud)

好一个 - 但是“转换”怎么样？ (2认同)

Answer 2

rap*_*ure 5

对于那些来寻找问题答案的人：

在转换函数pandas python中包含组名

并在此线程中结束，请继续阅读。

给定以下输入：

df = pd.DataFrame(data={'col1': list('aabccc'),
                        'col2': np.arange(6),
                        'col3': np.arange(6)})

Run Code Online (Sandbox Code Playgroud)

数据：

    col1    col2    col3
0   a       0       0
1   a       1       1
2   b       2       2
3   c       3       3
4   c       4       4
5   c       5       5

Run Code Online (Sandbox Code Playgroud)

我们可以像这样访问组名（在调用apply函数的范围内可见）：

df.groupby('col1') \
.apply(lambda frame: frame \
       .transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'col2' else col))

Run Code Online (Sandbox Code Playgroud)

输出：

    col1    col2    col3
0   a       3       0
1   a       4       1
2   b       2       2
3   c       3       3
4   c       4       4
5   c       5       5

Run Code Online (Sandbox Code Playgroud)

请注意，需要调用 apply 才能获得对子 pandas.core.frame.DataFrame（即框架）的引用，该子持有相应子组的名称属性。变换的参数（即 col）的 name 属性指的是列/系列名称。

或者，也可以遍历组，然后在每个组内遍历列：

for grp_name, sub_df in df.groupby('col1'):
    for col in sub_df:
        if grp_name == 'a' and col == 'col2':
            df.loc[df.col1 == grp_name, col] = sub_df[col] + 3

Run Code Online (Sandbox Code Playgroud)

我的用例非常罕见，这是实现我的目标的唯一方法（从 pandas v0.24.2 开始）。但是，我建议彻底探索 pandas 文档，因为很可能有一个更简单的矢量化解决方案来解决您可能需要此构造的用途。

非常失望，相同的属性在转换中不可用。 (2认同)

归档时间：	10 年，1 月前
查看次数：	4131 次
最近记录：	6 年，6 月前