Igg*_*ass 5 python group-by mean dataframe pandas
我想people_preferences根据其段获取以下数据帧的前两列中每一列的平均值Segment。
Fun|Not-Fun Pro-garden|Pro-home Segment
0 NaN NaN cats
1 NaN NaN cats
2 -1.0 NaN cats
... ... ... ...
4570 -1.0 -1.0 dogs
4571 -1.0 1.0 dogs
4572 -1.0 1.0 dogs
Run Code Online (Sandbox Code Playgroud)
所以我尝试了people_preferences.groupby('Segment', as_index=False).mean( skipna = True),但它返回了:UnsupportedFunctionCall: numpy operations are not valid with groupby. Use .groupby(...).mean() instead
这是完整的错误消息:
---------------------------------------------------------------------------
UnsupportedFunctionCall Traceback (most recent call last)
<ipython-input-489-f8da6e73c33c> in <module>
48 pairs = list(itertools.combinations(df_features.columns, 2))
49
---> 50 [plot_mean(pair[0],pair[1]) for pair in pairs]
51
52 fig = px.scatter(df_features, x=columns_x, y=columns_y)
<ipython-input-489-f8da6e73c33c> in <listcomp>(.0)
48 pairs = list(itertools.combinations(df_features.columns, 2))
49
---> 50 [plot_mean(pair[0],pair[1]) for pair in pairs]
51
52 fig = px.scatter(df_features, x=columns_x, y=columns_y)
<ipython-input-489-f8da6e73c33c> in plot_mean(column_x, column_y)
23 people_preferences = df_features[[column_x,column_y,'Segment']]
24 print(people_preferences)
---> 25 print(people_preferences.groupby('Segment', as_index=False).mean( skipna = True))
26 # parties.append('PEOPLE')
27 dataframe = pd.DataFrame(dict(x=parties_x, y=parties_y, parties = parties))
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in mean(self, *args, **kwargs)
1200 Name: B, dtype: float64
1201 """
-> 1202 nv.validate_groupby_func("mean", args, kwargs, ["numeric_only"])
1203 try:
1204 return self._cython_agg_general(
C:\ProgramData\Anaconda3\lib\site-packages\pandas\compat\numpy\function.py in validate_groupby_func(name, args, kwargs, allowed)
375 "numpy operations are not valid "
376 "with groupby. Use .groupby(...)."
--> 377 "{func}() instead".format(func=name)
378 )
379 )
Run Code Online (Sandbox Code Playgroud)
这是熊猫中的一个错误。在 groupby 上使用 Skipna查看平均值 True 或 False 给出错误
# define helper function
def custom_mean(df):
return df.mean(skipna=True)
# instead of
df.mean(skipna=True)
# use
df.agg(custom_mean)
Run Code Online (Sandbox Code Playgroud)
(注意:可能是 的skipna=True默认值pandas.mean(),尽管由于某种原因文档显示了默认值skipna=None)
我不确定是否正确理解了问题,但是,为了解决问题(不是特别是错误),您应该不会遇到问题:df.groupby(['Segment'])['Fun|Not-Fun','Pro-garden|Pro-home'].mean()因为默认行为是skipna=Truefor mean()。这是一个例子:
import pandas as pd
a = {'a':[1,1,1,2,2,2],'data':[np.nan,10,20,20,30,10],'data_2':[10,20,30,np.nan,10,20]}
df = pd.DataFrame(a)
print(df.groupby('a',as_index=False)['data','data_2'].mean())
Run Code Online (Sandbox Code Playgroud)
输出:
a data data_2
0 1 15.0 20.0
1 2 20.0 15.0
Run Code Online (Sandbox Code Playgroud)