use*_*500 4 mean pandas pandas-groupby
我有两个数据框。除了一列之外,它们是相同的。我想根据第一个数据帧的平均值更改第二个数据帧的列。对于后者,我必须使用 groupby,但随后我不知道如何反转。下面是一个最小的示例,在此特定示例中 df_two 最终应与 df_one 相同。我的问题是如何从 tmp 到 df2_new - 请参阅下面的代码。
import pandas as pd
def foo(df1, df2):
# Group by A
groupsA_one = dict(list(df1.groupby('A', as_index=False)))
groupsA_two = dict(list(df2.groupby('A', as_index=False)))
for key_A in groupsA_one:
# Group by B
groupsB_one = dict(list(groupsA_one[key_A].groupby('B', as_index=False)))
groupsB_two = dict(list(groupsA_two[key_A].groupby('B', as_index=False)))
for key_B in groupsB_one:
# Group by C
tmp = groupsB_two[key_B].groupby('C', as_index=False)['D'].mean() # Returns DataFrame with NaN
tmp['D'] = groupsB_one[key_B].groupby('C', as_index=False)['D'].mean()['D']
print tmp
df2_new = [] # ???
return df2_new
if __name__ == '__main__':
A1 = {'A': [1, 1, 1, 1, 2, 2, 2, 2], 'B': [1, 1, 2, 2, 1, 1, 2, 2],
'C': [1, 2, 1, 2, 1, 2, 1, 2], 'D': [5, 5, 5, 5, 5, 5, 5, 5]}
A2 = {'A': [1, 1, 1, 1, 2, 2, 2, 2], 'B': [1, 1, 2, 2, 1, 1, 2, 2],
'C': [1, 2, 1, 2, 1, 2, 1, 2], 'D': [0, 0, 0, 0, 0, 0, 0, 0]}
df_one = pd.DataFrame(A1)
df_two = pd.DataFrame(A2)
foo(df_one, df_two)
Run Code Online (Sandbox Code Playgroud)
我认为对于某些情况这可能更简单:
groupby = dfm.groupby('variable')
for ix, row in reversed(tuple(groupby)):
...
Run Code Online (Sandbox Code Playgroud)