big*_*nty 9 python python-3.x python-datetime pandas
我有一个示例 csv 数据文件。
Date
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
Run Code Online (Sandbox Code Playgroud)
接下来我试过了
In [174]: %timeit df = pd.read_csv("a.csv", parse_dates=["Date"])
1.5 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [175]: %timeit df = pd.read_csv("a.csv", parse_dates=["Date"], infer_datetime_format=True)
1.73 ms ± 45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Run Code Online (Sandbox Code Playgroud)
因此,根据文档,时间应该更短。我的理解正确吗?或者该声明对哪些数据有效?
更新:Pandas 版本 - '1.0.5'
你真正想做的是添加dayfirst = True
%timeit df = pd.read_csv("C:/Users/k_sego/Dates.csv", parse_dates=["Date"],dayfirst = True, infer_datetime_format=True)\n1.96 ms \xc2\xb1 115 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n相比
\n%timeit df = pd.read_csv("C:/Users/k_sego/Dates.csv", parse_dates=["Date"])\n2.38 ms \xc2\xb1 182 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n和
\n%timeit df = pd.read_csv("C:/Users/k_sego/Dates.csv", parse_dates=["Date"], infer_datetime_format=True)\n3.02 ms \xc2\xb1 670 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\nRun Code Online (Sandbox Code Playgroud)\n解决方案是减少 read_csv 必须执行的操作的选择数量。
\n| 归档时间: |
|
| 查看次数: |
366 次 |
| 最近记录: |