只是好奇'where'的行为以及为什么要在'loc'上使用它.
如果我创建一个数据帧:
df = pd.DataFrame({'ID':[1,2,3,4,5,6,7,8,9,10],
'Run Distance':[234,35,77,787,243,5435,775,123,355,123],
'Goals':[12,23,56,7,8,0,4,2,1,34],
'Gender':['m','m','m','f','f','m','f','m','f','m']})
Run Code Online (Sandbox Code Playgroud)
然后应用'where'功能:
df2 = df.where(df['Goals']>10)
Run Code Online (Sandbox Code Playgroud)
我得到以下内容,过滤掉Goals> 10的结果,但将其他所有内容保留为NaN:
Gender Goals ID Run Distance
0 m 12.0 1.0 234.0
1 m 23.0 2.0 35.0
2 m 56.0 3.0 77.0
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN
6 NaN NaN NaN NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN NaN
9 m 34.0 10.0 123.0
Run Code Online (Sandbox Code Playgroud)
但是,如果我使用'loc'功能:
df2 = df.loc[df['Goals']>10]
Run Code Online (Sandbox Code Playgroud)
它返回没有NaN值的子集的数据帧:
Gender Goals ID Run Distance
0 m 12 1 234
1 m 23 2 35
2 m 56 3 77
9 m 34 10 123
Run Code Online (Sandbox Code Playgroud)
所以基本上我很好奇为什么你会在'loc/iloc'上使用'where'以及为什么它会返回NaN值?
想象一下loc过滤器 - 只给出符合条件的df部分.
where最初来自numpy.它遍历一个数组并检查每个元素是否符合条件.因此它会返回整个数组,结果或NaN.一个很好的特性where是你也可以找回不同的东西,例如df2 = df.where(df['Goals']>10, other='0'),用0替换不满足条件的值.
ID Run Distance Goals Gender
0 1 234 12 m
1 2 35 23 m
2 3 77 56 m
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 10 123 34 m
Run Code Online (Sandbox Code Playgroud)
此外,虽然where仅用于条件过滤,loc但是在Pandas中选择的标准方式是iloc.loc使用行名和列名,同时iloc使用它们的索引号.所以loc你可以选择返回,比方说df.loc[0:1, ['Gender', 'Goals']]:
Gender Goals
0 m 12
1 m 23
Run Code Online (Sandbox Code Playgroud)
如果检查文档DataFrame.where,则按条件替换行 - 默认为NAN,但可以指定值:
df2 = df.where(df['Goals']>10)
print (df2)
ID Run Distance Goals Gender
0 1.0 234.0 12.0 m
1 2.0 35.0 23.0 m
2 3.0 77.0 56.0 m
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN
6 NaN NaN NaN NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN NaN
9 10.0 123.0 34.0 m
df2 = df.where(df['Goals']>10, 100)
print (df2)
ID Run Distance Goals Gender
0 1 234 12 m
1 2 35 23 m
2 3 77 56 m
3 100 100 100 100
4 100 100 100 100
5 100 100 100 100
6 100 100 100 100
7 100 100 100 100
8 100 100 100 100
9 10 123 34 m
Run Code Online (Sandbox Code Playgroud)
调用另一种语法boolean indexing,用于过滤行 - 删除匹配条件的行.
df2 = df.loc[df['Goals']>10]
#alternative
df2 = df[df['Goals']>10]
print (df2)
ID Run Distance Goals Gender
0 1 234 12 m
1 2 35 23 m
2 3 77 56 m
9 10 123 34 m
Run Code Online (Sandbox Code Playgroud)
如果可以使用,loc也可以按行按条件和列按名称进行过滤:
s = df.loc[df['Goals']>10, 'ID']
print (s)
0 1
1 2
2 3
9 10
Name: ID, dtype: int64
df2 = df.loc[df['Goals']>10, ['ID','Gender']]
print (df2)
ID Gender
0 1 m
1 2 m
2 3 m
9 10 m
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1667 次 |
| 最近记录: |