Mer*_*lin 66 python if-statement dataframe pandas
以此为出发点:
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
Out[8]:
one two three
0 10 1.2 4.2
1 15 70 0.03
2 8 5 0
Run Code Online (Sandbox Code Playgroud)
我想if
在熊猫中使用类似声明的东西.
if df['one'] >= df['two'] and df['one'] <= df['three']:
df['que'] = df['one']
Run Code Online (Sandbox Code Playgroud)
基本上,通过if
语句检查每一行,创建新列.
文档说要使用,.all
但没有例子......
unu*_*tbu 88
你可以使用np.where.如果cond
是布尔数组,A
并且B
是数组,那么
C = np.where(cond, A, B)
Run Code Online (Sandbox Code Playgroud)
将C定义为A
where 等于cond
True,B
其中cond
False.
import numpy as np
import pandas as pd
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
, df['one'], np.nan)
Run Code Online (Sandbox Code Playgroud)
产量
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
Run Code Online (Sandbox Code Playgroud)
如果您有多个条件,则可以使用np.select.例如,如果你想df['que']
等于df['two']
什么时候df['one'] < df['two']
,那么
conditions = [
(df['one'] >= df['two']) & (df['one'] <= df['three']),
df['one'] < df['two']]
choices = [df['one'], df['two']]
df['que'] = np.select(conditions, choices, default=np.nan)
Run Code Online (Sandbox Code Playgroud)
产量
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 70
2 8 5 0 NaN
Run Code Online (Sandbox Code Playgroud)
如果我们可以假设什么df['one'] >= df['two']
时候df['one'] < df['two']
是假的,那么条件和选择可以简化为
conditions = [
df['one'] < df['two'],
df['one'] <= df['three']]
choices = [df['two'], df['one']]
Run Code Online (Sandbox Code Playgroud)
(如果df['one']
或df['two']
包含NaN,则假设可能不正确.)
注意
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
Run Code Online (Sandbox Code Playgroud)
使用字符串值定义DataFrame.由于它们看起来是数字的,因此最好将这些字符串转换为浮点数:
df2 = df.astype(float)
Run Code Online (Sandbox Code Playgroud)
然而,这会改变结果,因为字符串逐字符比较,而浮点数在数字上进行比较.
In [61]: '10' <= '4.2'
Out[61]: True
In [62]: 10 <= 4.2
Out[62]: False
Run Code Online (Sandbox Code Playgroud)
小智 52
您可以使用.equals
列或整个数据帧.
df['col1'].equals(df['col2'])
Run Code Online (Sandbox Code Playgroud)
如果它们相等,则该语句将返回True
,否则False
.
Bob*_*ner 24
您可以使用apply()并执行类似的操作
df['que'] = df.apply(lambda x : x['one'] if x['one'] >= x['two'] and x['one'] <= x['three'] else "", axis=1)
Run Code Online (Sandbox Code Playgroud)
或者如果你不想使用lambda
def que(x):
if x['one'] >= x['two'] and x['one'] <= x['three']:
return x['one']
else:
''
df['que'] = df.apply(que, axis=1)
Run Code Online (Sandbox Code Playgroud)
将每个单独的条件括在括号中,然后使用&
运算符组合条件:
df.loc[(df['one'] >= df['two']) & (df['one'] <= df['three']), 'que'] = df['one']
Run Code Online (Sandbox Code Playgroud)
您可以通过仅使用~
(“not”运算符)来反转匹配来填充不匹配的行:
df.loc[~ ((df['one'] >= df['two']) & (df['one'] <= df['three'])), 'que'] = ''
Run Code Online (Sandbox Code Playgroud)
您需要使用&
and~
而不是and
andnot
因为&
and~
运算符逐个元素地工作。
最终结果:
df
Out[8]:
one two three que
0 10 1.2 4.2 10
1 15 70 0.03
2 8 5 0
Run Code Online (Sandbox Code Playgroud)
一种方法是使用布尔系列来索引列df['one']
.这会为您提供一个新列,其中的True
条目具有与该行相同的值,df['one']
并且False
值为NaN
.
布尔系列由您的if
语句给出(尽管有必要使用&
而不是and
):
>>> df['que'] = df['one'][(df['one'] >= df['two']) & (df['one'] <= df['three'])]
>>> df
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
Run Code Online (Sandbox Code Playgroud)
如果希望将NaN
值替换为其他值,则可以fillna
在新列上使用该方法que
.我在这里用0
而不是空字符串:
>>> df['que'] = df['que'].fillna(0)
>>> df
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 0
2 8 5 0 0
Run Code Online (Sandbox Code Playgroud)
小智 7
我想为那些尝试比较具有值的两列中的值的相等性NaN
并False
在两个值均为 时获取的人添加此答案NaN
。根据定义,NaN
!= NaN
(参见:numpy.isnan(value) 与 value == numpy.nan 不同?)。
如果你想让两个NaN
比较返回True
,你可以使用:
df['compare'] = (df["col_1"] == df["col_2"]) | (df["col_1"].isna() & df["col_2"].isna())
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
194967 次 |
最近记录: |