我有一个数据框,其中一些列包含空列表,其他列包含字符串列表:
donation_orgs donation_context
0 [] []
1 [the research of Dr. ...] [In lieu of flowers , memorial donations ...]
Run Code Online (Sandbox Code Playgroud)
我正在尝试返回没有任何有空列表的行的数据集.
我试过检查空值:
dfnotnull = df[df.donation_orgs != []]
dfnotnull
Run Code Online (Sandbox Code Playgroud)
和
dfnotnull = df[df.notnull().any(axis=1)]
pd.options.display.max_rows=500
dfnotnull
Run Code Online (Sandbox Code Playgroud)
我已经尝试循环并检查存在的值,但我认为列表不会像我认为的那样返回Null或None:
dfnotnull = pd.DataFrame(columns=('donation_orgs', 'donation_context'))
for i in range(0,len(df)):
if df['donation_orgs'].iloc(i):
dfnotnull.loc[i] = df.iloc[i]
Run Code Online (Sandbox Code Playgroud)
上述所有三种方法都只返回原始数据帧中的每一行.=
我正在尝试将csv文件中的数据读入pandas数据帧,但是当读入数据帧时,标题会移过两列.
我认为它与标题后面有两个空白行有关,但我不确定.它似乎是在前两列中读取行标题/索引.
CSV格式:
VendorID,lpep_pickup_datetime,Lpep_dropoff_datetime,Store_and_fwd_flag,RateCodeID,Pickup_longitude,Pickup_latitude,Dropoff_longitude,Dropoff_latitude,Passenger_count,Trip_distance,Fare_amount,Extra,MTA_tax,Tip_amount,Tolls_amount,Ehail_fee,Total_amount,Payment_type,Trip_type
2,2014-04-01 00:00:00,2014-04-01 14:24:20,N,1,0,0,0,0,1,7.45,23,0,0.5,0,0,,23.5,2,1,,
2,2014-04-01 00:00:00,2014-04-01 17:21:33,N,1,0,0,-73.987663269042969,40.780872344970703,1,8.95,31,1,0.5,0,0,,32.5,2,1,,
Run Code Online (Sandbox Code Playgroud)
数据帧格式:
VendorID lpep_pickup_datetime \
2 2014-04-01 00:00:00 2014-04-01 14:24:20 N
2014-04-01 00:00:00 2014-04-01 17:21:33 N
2014-04-01 00:00:00 2014-04-01 15:06:18 N
2014-04-01 00:00:00 2014-04-01 08:09:27 N
2014-04-01 00:00:00 2014-04-01 16:15:13 N
Lpep_dropoff_datetime Store_and_fwd_flag RateCodeID \
2 2014-04-01 00:00:00 1 0 0
2014-04-01 00:00:00 1 0 0
2014-04-01 00:00:00 1 0 0
2014-04-01 00:00:00 1 0 0
2014-04-01 00:00:00 1 0 0
Run Code Online (Sandbox Code Playgroud)
代码如下:
file ='green_tripdata_2014-04.csv'
df4 = pd.read_csv(file)
print(df4.head(5))
Run Code Online (Sandbox Code Playgroud)
我只是需要它来读入数据框,标题位于正确的位置.
当一个数据帧中的日期时间对象位于另一个数据帧的日期时间对象范围内时,尝试合并两个数据帧。
继续获取:KeyError: 'cannot use a single bool to index into setitem' 在我发布的第二块代码中。
gametaxidf.loc[arrivemask, 'relevant'] = 1
Run Code Online (Sandbox Code Playgroud)
我假设它也会在下面的行中使用类似的命令发生。
这是给我带来麻烦的部分:
with open('/Users/benjaminprice/Desktop/TaxiCombined/Data/combinedtaxifiltered.csv', 'w') as csvfile:
fieldnames1 = ['index','pickup_datetime', 'dropoff_datetime', 'pickup_long', 'pickup_lat','dropoff_long','dropoff_lat','passenger_count','trip_distance','fare_amount','tip_amount','total_amount','stadium_code']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames1)
writer.writeheader()
for index, row in baseballdf.iterrows():
gametimestart = row['Start.Time']
gametimeend = row['End.Time']
arrivemin = gametimestart - datetime.timedelta(minutes=120)
arrivemax = gametimeend - datetime.timedelta(minutes = 30)
departmin = gametimeend - datetime.timedelta(minutes = 60)
departmax = gametimeend + datetime.timedelta(minutes = 90)
gametaxidf = combineddf[combineddf.DATE==row.DATE]
gametaxidf['relevant']=0
for index, row in gametaxidf.iterrows(): …
Run Code Online (Sandbox Code Playgroud)