infer_datetime_format 与 parse_date 花费更多时间

big*_*nty 9 python python-3.x python-datetime pandas

我正在浏览熊猫文档。它引用了 在此处输入图片说明

我有一个示例 csv 数据文件。

Date
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
Run Code Online (Sandbox Code Playgroud)

接下来我试过了

In [174]: %timeit df = pd.read_csv("a.csv", parse_dates=["Date"])
1.5 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [175]: %timeit df = pd.read_csv("a.csv", parse_dates=["Date"], infer_datetime_format=True)
1.73 ms ± 45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Run Code Online (Sandbox Code Playgroud)

因此,根据文档,时间应该更短。我的理解正确吗?或者该声明对哪些数据有效?

更新:Pandas 版本 - '1.0.5'

Ser*_*nes 1

你真正想做的是添加dayfirst = True

\n
%timeit df = pd.read_csv("C:/Users/k_sego/Dates.csv", parse_dates=["Date"],dayfirst = True, infer_datetime_format=True)\n1.96 ms \xc2\xb1 115 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n

相比

\n
%timeit df = pd.read_csv("C:/Users/k_sego/Dates.csv", parse_dates=["Date"])\n2.38 ms \xc2\xb1 182 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n

\n
%timeit df = pd.read_csv("C:/Users/k_sego/Dates.csv", parse_dates=["Date"], infer_datetime_format=True)\n3.02 ms \xc2\xb1 670 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n

解决方案是减少 read_csv 必须执行的操作的选择数量。

\n