小编Ben*_*ice的帖子

从pandas数据框中删除带有空列表的行

我有一个数据框,其中一些列包含空列表,其他列包含字符串列表:

       donation_orgs                              donation_context
0            []                                           []
1   [the research of Dr. ...]   [In lieu of flowers , memorial donations ...]

Run Code Online (Sandbox Code Playgroud)

我正在尝试返回没有任何有空列表的行的数据集.

我试过检查空值:

dfnotnull = df[df.donation_orgs != []]
dfnotnull

Run Code Online (Sandbox Code Playgroud)

和

dfnotnull = df[df.notnull().any(axis=1)]
pd.options.display.max_rows=500
dfnotnull

Run Code Online (Sandbox Code Playgroud)

我已经尝试循环并检查存在的值,但我认为列表不会像我认为的那样返回Null或None:

dfnotnull = pd.DataFrame(columns=('donation_orgs', 'donation_context'))
for i in range(0,len(df)):
    if df['donation_orgs'].iloc(i):
        dfnotnull.loc[i] = df.iloc[i]

Run Code Online (Sandbox Code Playgroud)

上述所有三种方法都只返回原始数据帧中的每一行.=

python list isnull pandas

Ben*_*ice

2015 12-09

14
推荐指数

4
解决办法

1万
查看次数

当执行csv读取时,pandas数据帧头被移位

我正在尝试将csv文件中的数据读入pandas数据帧,但是当读入数据帧时,标题会移过两列.

我认为它与标题后面有两个空白行有关,但我不确定.它似乎是在前两列中读取行标题/索引.

CSV格式:

VendorID,lpep_pickup_datetime,Lpep_dropoff_datetime,Store_and_fwd_flag,RateCodeID,Pickup_longitude,Pickup_latitude,Dropoff_longitude,Dropoff_latitude,Passenger_count,Trip_distance,Fare_amount,Extra,MTA_tax,Tip_amount,Tolls_amount,Ehail_fee,Total_amount,Payment_type,Trip_type 


2,2014-04-01 00:00:00,2014-04-01 14:24:20,N,1,0,0,0,0,1,7.45,23,0,0.5,0,0,,23.5,2,1,,
2,2014-04-01 00:00:00,2014-04-01 17:21:33,N,1,0,0,-73.987663269042969,40.780872344970703,1,8.95,31,1,0.5,0,0,,32.5,2,1,,

Run Code Online (Sandbox Code Playgroud)

数据帧格式:

                                   VendorID lpep_pickup_datetime  \
2 2014-04-01 00:00:00  2014-04-01 14:24:20                    N   
  2014-04-01 00:00:00  2014-04-01 17:21:33                    N   
  2014-04-01 00:00:00  2014-04-01 15:06:18                    N   
  2014-04-01 00:00:00  2014-04-01 08:09:27                    N   
  2014-04-01 00:00:00  2014-04-01 16:15:13                    N   

                       Lpep_dropoff_datetime  Store_and_fwd_flag  RateCodeID  \
2 2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0

Run Code Online (Sandbox Code Playgroud)

代码如下:

file ='green_tripdata_2014-04.csv'
df4 = pd.read_csv(file)
print(df4.head(5))

Run Code Online (Sandbox Code Playgroud)

我只是需要它来读入数据框,标题位于正确的位置.

python csv pandas

Ben*_*ice

2015 11-18

6
推荐指数

1
解决办法

3812
查看次数

在 pandas 数据帧上使用布尔过滤器时出现 KeyError

当一个数据帧中的日期时间对象位于另一个数据帧的日期时间对象范围内时，尝试合并两个数据帧。

继续获取：KeyError: 'cannot use a single bool to index into setitem' 在我发布的第二块代码中。

gametaxidf.loc[arrivemask, 'relevant'] = 1

Run Code Online (Sandbox Code Playgroud)

我假设它也会在下面的行中使用类似的命令发生。

这是给我带来麻烦的部分：

with open('/Users/benjaminprice/Desktop/TaxiCombined/Data/combinedtaxifiltered.csv', 'w') as csvfile: 
    fieldnames1 = ['index','pickup_datetime', 'dropoff_datetime', 'pickup_long', 'pickup_lat','dropoff_long','dropoff_lat','passenger_count','trip_distance','fare_amount','tip_amount','total_amount','stadium_code'] 
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames1) 
    writer.writeheader()

for index, row in baseballdf.iterrows(): 
    gametimestart = row['Start.Time'] 
    gametimeend = row['End.Time'] 
    arrivemin = gametimestart - datetime.timedelta(minutes=120) 
    arrivemax = gametimeend - datetime.timedelta(minutes = 30) 
    departmin = gametimeend - datetime.timedelta(minutes = 60) 
    departmax = gametimeend + datetime.timedelta(minutes = 90)

    gametaxidf = combineddf[combineddf.DATE==row.DATE]
    gametaxidf['relevant']=0

    for index, row in gametaxidf.iterrows(): …

Run Code Online (Sandbox Code Playgroud)

python boolean dataframe pandas keyerror

Ben*_*ice

2015 11-20

5
推荐指数

1
解决办法

1万
查看次数