las*_*lex 3 python numpy dataframe pandas
我有一个基于此示例代码的相当简单的问题:
x1 = 10*np.random.randn(10,3)
df1 = pd.DataFrame(x1)
Run Code Online (Sandbox Code Playgroud)
我正在寻找一个派生自df1其中正值替换为"up",负值替换为"down",并且0值(如果有)替换为"zero". 我曾尝试使用.where()和.mask()方法,但无法获得所需的结果。
我看过其他帖子,它们一次根据多个条件进行过滤,但它们没有展示如何根据不同条件替换值。
Har*_*ain 12
对于多个条件,即。(df['employrate'] <=55) & (df['employrate'] > 50)
用这个:
df['employrate'] = np.where(
(df['employrate'] <=55) & (df['employrate'] > 50) , 11, df['employrate']
)
Run Code Online (Sandbox Code Playgroud)
或者你也可以这样做,
gm.loc[(gm['employrate'] <55) & (gm['employrate'] > 50),'employrate']=11
Run Code Online (Sandbox Code Playgroud)
这里的非正式语法可以是:
<dataset>.loc[<filter1> & (<filter2>),'<variable>']='<value>'
Run Code Online (Sandbox Code Playgroud)
out[108]:
country employrate alcconsumption
0 Afghanistan 55.700001 .03
1 Albania 11.000000 7.29
2 Algeria 11.000000 .69
3 Andorra nan 10.17
4 Angola 75.699997 5.57
Run Code Online (Sandbox Code Playgroud)
因此我们在这里使用的语法是:
df['<column_name>'] = np.where((<filter 1> ) & (<filter 2>) , <new value>, df['column_name'])
Run Code Online (Sandbox Code Playgroud)
对于单一条件,即( 'employrate'] > 70 )
country employrate alcconsumption
0 Afghanistan 55.7000007629394 .03
1 Albania 51.4000015258789 7.29
2 Algeria 50.5 .69
3 Andorra 10.17
4 Angola 75.6999969482422 5.57
Run Code Online (Sandbox Code Playgroud)
用这个:
df.loc[df['employrate'] > 70, 'employrate'] = 7
Run Code Online (Sandbox Code Playgroud)
country employrate alcconsumption
0 Afghanistan 55.700001 .03
1 Albania 51.400002 7.29
2 Algeria 50.500000 .69
3 Andorra nan 10.17
4 Angola 7.000000 5.57
Run Code Online (Sandbox Code Playgroud)
因此这里的语法是:
df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]
Run Code Online (Sandbox Code Playgroud)
df1.apply(np.sign).replace({-1: 'down', 1: 'up', 0: 'zero'})
Run Code Online (Sandbox Code Playgroud)
输出:
0 1 2
0 down up up
1 up down down
2 up down down
3 down down up
4 down down up
5 down up up
6 down up down
7 up down down
8 up up down
9 down up up
Run Code Online (Sandbox Code Playgroud)
PSrandn当然,完全为零是不太可能的
一般来说,您可以使用np.selectonvalues并重新构建DataFrame
import pandas as pd
import numpy as np
df1 = pd.DataFrame(10*np.random.randn(10, 3))
df1.iloc[0, 0] = 0 # So we can check the == 0 condition
conds = [df1.values < 0 , df1.values > 0]
choices = ['down', 'up']
pd.DataFrame(np.select(conds, choices, default='zero'),
index=df1.index,
columns=df1.columns)
Run Code Online (Sandbox Code Playgroud)
0 1 2
0 zero down up
1 up down up
2 up up up
3 down down down
4 up up up
5 up up up
6 up up down
7 up up down
8 down up down
9 up up down
Run Code Online (Sandbox Code Playgroud)