Amr*_*hna 4 python sum multiple-columns dataframe pandas
我有一个数据框,如下所示.最后一列显示的值从所有列即总和A
,B
,D
,K
和T
.请注意一些列也有NaN
.
word1,A,B,D,K,T,sum
na,,63.0,,,870.0,933.0
sva,,1.0,,3.0,695.0,699.0
a,,102.0,,1.0,493.0,596.0
sa,2.0,487.0,,2.0,15.0,506.0
su,1.0,44.0,,136.0,214.0,395.0
waw,1.0,9.0,,34.0,296.0,340.0
Run Code Online (Sandbox Code Playgroud)
如何计算每一行的熵?即我应该找到类似的东西
df['A']/df['sum']*log(df['A']/df['sum']) + df['B']/df['sum']*log(df['B']/df['sum']) + ...... + df['T']/df['sum']*log(df['T']/df['sum'])
Run Code Online (Sandbox Code Playgroud)
条件是每当内部的值log
变为zero
或时NaN
,整个值应被视为零(根据定义,日志将返回错误,因为日志0未定义).
我知道使用lambda操作来应用于各个列.在这里,我不能在那里固定的列想了纯大熊猫的解决方案sum
是在不同的列应用A
,B
,D
等等.虽然我能想到一个简单的loopwise遍历所有CSV通过硬编码列值的文件.
我想你可以使用ix
从选择列A
到T
,再除以div
用numpy.log
.上次使用sum
:
print (df['A']/df['sum']*np.log(df['A']/df['sum']))
0 NaN
1 NaN
2 NaN
3 -0.021871
4 -0.015136
5 -0.017144
dtype: float64
print (df.ix[:,'A':'T'].div(df['sum'],axis=0)*np.log(df.ix[:,'A':'T'].div(df['sum'],axis=0)))
A B D K T
0 NaN -0.181996 NaN NaN -0.065191
1 NaN -0.009370 NaN -0.023395 -0.005706
2 NaN -0.302110 NaN -0.010722 -0.156942
3 -0.021871 -0.036835 NaN -0.021871 -0.104303
4 -0.015136 -0.244472 NaN -0.367107 -0.332057
5 -0.017144 -0.096134 NaN -0.230259 -0.120651
print((df.ix[:,'A':'T'].div(df['sum'],axis=0)*np.log(df.ix[:,'A':'T'].div(df['sum'],axis=0)))
.sum(axis=1))
0 -0.247187
1 -0.038471
2 -0.469774
3 -0.184881
4 -0.958774
5 -0.464188
dtype: float64
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
364 次 |
最近记录: |