因此,我试图从Kaggle读取此数据集。
https://www.kaggle.com/gmadevs/atp-matches-dataset#atp_matches_2016.csv
我正在使用pandas的read_csv函数,但没有正确拆分列。我已经试过这段代码
df_2016 = pd.read_csv("Path/to/file/atp_matches_2016.csv")
Run Code Online (Sandbox Code Playgroud)
打印出的数据框虽然给了我
tourney_id ... l_bpFaced
2016-M020 Brisbane Hard 32.0 A 20160104.0 300.0 105683.0 4.0 NaN Milos Raonic R 196.0 CAN 25.021218 14.0 2170.0 103819.0 1.0 NaN Roger Federer ... NaN
299.0 103819.0 1.0 NaN Roger Federer R 185.0 SUI 34.406571 3.0 8265.0 106233.0 8.0 NaN Dominic Thiem ... NaN
298.0 105683.0 4.0 NaN Milos Raonic R 196.0 CAN 25.021218 14.0 2170.0 106071.0 7.0 NaN Bernard Tomic ... NaN
297.0 103819.0 1.0 NaN Roger Federer R 185.0 SUI 34.406571 3.0 8265.0 105777.0 NaN NaN Grigor Dimitrov ... NaN
296.0 106233.0 8.0 NaN Dominic Thiem R NaN AUT 22.335387 20.0 1600.0 105227.0 3.0 NaN Marin Cilic ... NaN
Run Code Online (Sandbox Code Playgroud)
为什么在拆分列时遇到问题?
我期待它的输出,出于某种原因,这是除2016年和2017年之外我每年获得的输出。
tourney_id tourney_name surface ... l_SvGms l_bpSaved l_bpFaced
0 2015-329 Tokyo Hard ... 10.0 2.0 5.0
1 2015-329 Tokyo Hard ... 13.0 12.0 19.0
2 2015-329 Tokyo Hard ... 18.0 9.0 11.0
3 2015-329 Tokyo Hard ... 13.0 4.0 8.0
4 2015-329 Tokyo Hard ... 10.0 1.0 5.0
Run Code Online (Sandbox Code Playgroud)
实际的csv文件看起来状态良好,并且格式与其他年份相同。我还尝试在read_csv函数中使用columns参数指定列,但这给了我相同的输出。
我能想到的最安全的方法是读取 csv 两次:
rows = pd.read_csv('path/to/atp_matches_2016.csv', skiprows=[0], header = None)
# skip header line
rows = rows.dropna(axis=1, how='all')
# drop columns that only have NaNs
rows.columns = pd.read_csv('path/to/atp_matches_2016.csv', nrows=0).columns
print(rows.head(5))
Run Code Online (Sandbox Code Playgroud)
输出:
tourney_id tourney_name surface draw_size tourney_level tourney_date \
0 2016-M020 Brisbane Hard 32.0 A 20160104.0
1 2016-M020 Brisbane Hard 32.0 A 20160104.0
2 2016-M020 Brisbane Hard 32.0 A 20160104.0
3 2016-M020 Brisbane Hard 32.0 A 20160104.0
4 2016-M020 Brisbane Hard 32.0 A 20160104.0
match_num winner_id winner_seed winner_entry ... w_bpFaced l_ace l_df \
0 300.0 105683.0 4.0 NaN ... 1.0 7.0 3.0
1 299.0 103819.0 1.0 NaN ... 1.0 2.0 4.0
2 298.0 105683.0 4.0 NaN ... 4.0 10.0 3.0
3 297.0 103819.0 1.0 NaN ... 1.0 8.0 2.0
4 296.0 106233.0 8.0 NaN ... 2.0 11.0 2.0
l_svpt l_1stIn l_1stWon l_2ndWon l_SvGms l_bpSaved l_bpFaced
0 61.0 34.0 25.0 14.0 10.0 3.0 5.0
1 55.0 31.0 18.0 9.0 8.0 2.0 6.0
2 84.0 54.0 41.0 16.0 12.0 2.0 2.0
3 104.0 62.0 46.0 21.0 16.0 8.0 11.0
4 98.0 52.0 41.0 27.0 15.0 7.0 8.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
228 次 |
| 最近记录: |