Car*_*arl 2 python dataframe pandas
import numpy as np
import pandas as pd
columns = ['id', 'A', 'B', 'C']
index = np.arange(3)
df = pd.DataFrame(np.random.randn(3,4), columns=columns, index=index)
weights = {'A': 0.10, 'B': 1.00, 'C': 1.50}
Run Code Online (Sandbox Code Playgroud)
我需要使用相应的权重(不包括第一列)将每个"单元格"中的值复用.例如:
df.at[0,'A'] * weights['A']
df.at[0,'B'] * weights['B']
Run Code Online (Sandbox Code Playgroud)
什么是最有效的方法,并在新的DataFrame中得到结果?
建立
df
Out[1013]:
id A B C
0 -0.641314 -0.526509 0.225116 -1.131141
1 0.018321 -0.944734 -0.123334 -0.853356
2 0.703119 0.468857 1.038572 -1.529723
weights
Out[1026]: {'A': 0.1, 'B': 1.0, 'C': 1.5}
W = np.asarray([weights[e] for e in sorted(weights.keys())])
Run Code Online (Sandbox Code Playgroud)
解
#use a matrix multiplication to apply the weights to each column
df.loc[:,['A','B','C']] *= W
df
Out[1016]:
id A B C
0 -0.641314 -0.052651 0.225116 -1.696712
1 0.018321 -0.094473 -0.123334 -1.280034
2 0.703119 0.046886 1.038572 -2.294584
Run Code Online (Sandbox Code Playgroud)
更新
如果您需要保持列名灵活,我认为更好的方法是在2个列表中保存列名和权重:
columns = sorted(weights.keys())
Out[1072]: ['A', 'B', 'C']
weights = [weights[e] for e in columns]
Out[1074]: [0.1, 1.0, 1.5]
Run Code Online (Sandbox Code Playgroud)
然后你就可以这样做:
df.loc[:,columns] *=weights
Out[1067]:
id A B C
0 -0.641314 -0.052651 0.225116 -1.696712
1 0.018321 -0.094473 -0.123334 -1.280034
2 0.703119 0.046886 1.038572 -2.294584
Run Code Online (Sandbox Code Playgroud)
一个oneliner解决方案:
df.loc[:,sorted(weights.keys())] *=[weights[e] for e in sorted(weights.keys())]
df
Out[1089]:
id A B C
0 -0.641314 -0.052651 0.225116 -1.696712
1 0.018321 -0.094473 -0.123334 -1.280034
2 0.703119 0.046886 1.038572 -2.294584
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
149 次 |
最近记录: |