fly*_*y36 7 python numpy pandas
我的数据框df如下所示.我想计算最后3个非纳米柱的平均值.如果少于三个非缺失列,则缺少平均数.
name day1 day2 day3 day4 day5 day6 day7
A 1 1 nan 2 3 0 3
B nan nan nan nan nan nan 3
C 1 1 0 1 1 1 1
D 1 1 0 1 nan 1 4
Run Code Online (Sandbox Code Playgroud)
期望输出应如下所示
name day1 day2 day3 day4 day5 day6 day7 expected
A 1 1 nan 2 3 0 3 2 <- 1/3*(day5 + day6 + day7)
B nan nan nan nan nan nan 3 nan <- less than 3 non-missing
C 1 1 0 1 1 1 1 1 <- 1/3*(day5 + day6 + day7)
D 1 1 0 1 nan 1 4 2 <- 1/3 *(day4 + day6 + day7)
Run Code Online (Sandbox Code Playgroud)
我知道如何计算最后三列的平均值,并计算有多少次非遗漏观察.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column
如果有少于3个非缺失观察,我知道如何设置平均值为缺失使用 df.iloc[:, 1:7].count(axis=1) <= 3.
但我正在努力寻找一种方法来计算最后三个非缺失列的平均值.任何人都可以教我如何解决这个问题吗?
矢量化使用justify-
N = 3 # last N entries for averaging
avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)
df['expected'] = avg
Run Code Online (Sandbox Code Playgroud)