目前,我的表有超过10000000个记录,并且有一个名为的列ID,如果ID在给定列表中,我想用新值更新名为'3rd_col' 的列.
我用.loc,这是我的代码
for _id in given_ids:
df.loc[df.ID == _id, '3rd_col'] = new_value
Run Code Online (Sandbox Code Playgroud)
但是上面代码的性能很慢,如何才能提高更新值的性能呢?
对不起,这里我想更具体地说明我的问题,不同的id根据函数分配不同的值,并且大约有4列要分配.
for _id in given_ids:
df.loc[df.ID == _id, '3rd_col'] = return_new_val_1(id)
df.loc[df.ID == _id, '4rd_col'] = return_new_val_2(id)
df.loc[df.ID == _id, '5rd_col'] = return_new_val_3(id)
df.loc[df.ID == _id, '6rd_col'] = return_new_val_4(id)
Run Code Online (Sandbox Code Playgroud)
您可以dictionary先创建然后replace:
#sample function
def return_new_val(x):
return x * 3
given_ids = list('abc')
d = {_id: return_new_val(_id) for _id in given_ids}
print (d)
{'a': 'aaa', 'c': 'ccc', 'b': 'bbb'}
df = pd.DataFrame({'ID':list('abdefc'),
'M':[4,5,4,5,5,4]})
df['3rd_col'] = df['ID'].replace(d)
print (df)
ID M 3rd_col
0 a 4 aaa
1 b 5 bbb
2 d 4 d
3 e 5 e
4 f 5 f
5 c 4 ccc
Run Code Online (Sandbox Code Playgroud)
或者map,然后得到NaNs不匹配:
df['3rd_col'] = df['ID'].map(d)
print (df)
ID M 3rd_col
0 a 4 aaa
1 b 5 bbb
2 d 4 NaN
3 e 5 NaN
4 f 5 NaN
5 c 4 ccc
Run Code Online (Sandbox Code Playgroud)
编辑:
如果需要通过多个函数追加数据,请首先创建new DataFrame然后再创建join原始数据:
def return_new_val1(x):
return x * 2
def return_new_val2(x):
return x * 3
given_ids = list('abc')
df2 = pd.DataFrame({'ID':given_ids})
df2['3rd_col'] = df2['ID'].map(return_new_val1)
df2['4rd_col'] = df2['ID'].map(return_new_val2)
df2 = df2.set_index('ID')
print (df2)
3rd_col 4rd_col
ID
a aa aaa
b bb bbb
c cc ccc
Run Code Online (Sandbox Code Playgroud)
df = pd.DataFrame({'ID':list('abdefc'),
'M':[4,5,4,5,5,4]})
df = df.join(df2, on='ID')
print (df)
ID M 3rd_col 4rd_col
0 a 4 aa aaa
1 b 5 bb bbb
2 d 4 NaN NaN
3 e 5 NaN NaN
4 f 5 NaN NaN
5 c 4 cc ccc
#bur replace NaNs by values in `ID`
cols = ['3rd_col','4rd_col']
df[cols] = df[cols].mask(df[cols].isnull(), df['ID'], axis=0)
print (df)
ID M 3rd_col 4rd_col
0 a 4 aa aaa
1 b 5 bb bbb
2 d 4 d d
3 e 5 e e
4 f 5 f f
5 c 4 cc ccc
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1552 次 |
| 最近记录: |