我想弄清楚如何从我的数组中删除nan值.它看起来像这样:
x = [1400, 1500, 1600, nan, nan, nan ,1700] #Not in this exact configuration
Run Code Online (Sandbox Code Playgroud)
我对python比较陌生,所以我还在学习.有小费吗?
我想在其中一列上使用正则表达式干净地过滤数据帧.
对于一个人为的例子:
In [210]: foo = pd.DataFrame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})
In [211]: foo
Out[211]:
a b
0 1 hi
1 2 foo
2 3 fat
3 4 cat
Run Code Online (Sandbox Code Playgroud)
我想将行过滤为f使用正则表达式开头的行.先去:
In [213]: foo.b.str.match('f.*')
Out[213]:
0 []
1 ()
2 ()
3 []
Run Code Online (Sandbox Code Playgroud)
这不是太有用了.但是这会得到我的布尔索引:
In [226]: foo.b.str.match('(f.*)').str.len() > 0
Out[226]:
0 False
1 True
2 True
3 False
Name: b
Run Code Online (Sandbox Code Playgroud)
所以我可以通过以下方式来限制:
In [229]: foo[foo.b.str.match('(f.*)').str.len() > 0]
Out[229]:
a b
1 2 foo
2 3 fat …Run Code Online (Sandbox Code Playgroud) 我在Pandas中使用布尔索引.问题是为什么声明:
a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]
Run Code Online (Sandbox Code Playgroud)
工作正常,而
a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]
Run Code Online (Sandbox Code Playgroud)
存在错误?
例:
a=pd.DataFrame({'x':[1,1],'y':[10,20]})
In: a[(a['x']==1)&(a['y']==10)]
Out: x y
0 1 10
In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Run Code Online (Sandbox Code Playgroud) 我在使用if语句评估字典中的值时遇到了麻烦.
给出以下字典,我从数据框导入(如果它很重要):
>>> pnl[company]
29: Active Credit Date Debit Strike Type
0 1 0 2013-01-08 2.3265 21.15 Put
1 0 0 2012-11-26 40 80 Put
2 0 0 2012-11-26 400 80 Put
Run Code Online (Sandbox Code Playgroud)
我尝试评估以下语句以确定最后一个值的值Active:
if pnl[company].tail(1)['Active']==1:
print 'yay'
Run Code Online (Sandbox Code Playgroud)
但是,我遇到以下错误消息:
Traceback (most recent call last):
File "<pyshell#69>", line 1, in <module>
if pnl[company].tail(1)['Active']==1:
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 676, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Run Code Online (Sandbox Code Playgroud)
这让我感到惊讶,因为我可以在没有if语句的情况下使用上面的命令显示我想要的值:
>>> pnl[company].tail(1)['Active']
30: …Run Code Online (Sandbox Code Playgroud) 我想删除pandas DataFrame中不在列表中的所有行
例如,考虑这个数据帧:
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'year': [2012, 2012, 2013, 2014, 2014],
'reports': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
df
Run Code Online (Sandbox Code Playgroud)
要按名称删除一行,这很容易:
df = df[df.name != 'Tina'] # to drop the row which include Tina in the name column
Run Code Online (Sandbox Code Playgroud)
但如果我只想保留Jason和Molly这一行:
List=['Jason', 'Molly']
df = df[df.name not in List]
Run Code Online (Sandbox Code Playgroud)
不起作用!
这似乎很简单,但我似乎无法弄清楚。我知道如何将熊猫数据框过滤到满足条件的所有行,但是当我想要相反时,我总是收到奇怪的错误。
这是例子。(上下文:一个简单的棋盘游戏,棋子在网格上,我们试图给它一个坐标并返回所有相邻的棋子,但不在该实际坐标上返回棋子)
import pandas as pd
import numpy as np
df = pd.DataFrame([[5,7, 'wolf'],
[5,6,'cow'],
[8, 2, 'rabbit'],
[5, 3, 'rabbit'],
[3, 2, 'cow'],
[7, 5, 'rabbit']],
columns = ['lat', 'long', 'type'])
coords = [5,7] #the coordinate I'm testing, a wolf
view = df[((coords[0] - 1) <= df['lat']) & (df['lat'] <= (coords[0] + 1)) \
& ((coords[1] - 1) <= df['long']) & (df['long'] <= (coords[1] + 1))]
view = view[not ((coords[0] == view['lat']) & (coords[1] == view['long'])) ]
print(view)
Run Code Online (Sandbox Code Playgroud)
我认为 …