假设有一个csv
文件名为ta_sample.csv
:
"BILL_DT","AMOUNT"
"2015-07-27T18:30:00Z",16000
"2015-07-07T18:30:00Z",6110
"2015-07-26T18:30:00Z",250
"2015-07-22T18:30:00Z",1000
"2015-07-06T18:30:00Z",2640000
Run Code Online (Sandbox Code Playgroud)
read_csv_arrow
使用并自定义实际生产数据中始终需要的列类型来阅读上述内容:
library(arrow)
read_csv_arrow(
"ta_sample.csv",
col_names = c("BILL_DT", "AMOUNT"),
col_types = "td",
skip = 1,
timestamp_parsers = c("%Y-%m-%dT%H:%M:%SZ"))
Run Code Online (Sandbox Code Playgroud)
结果如下:
# A tibble: 5 x 2
BILL_DT AMOUNT
<dttm> <dbl>
1 2015-07-28 00:00:00 16000
2 2015-07-08 00:00:00 6110
3 2015-07-27 00:00:00 250
4 2015-07-23 00:00:00 1000
5 2015-07-07 00:00:00 2640000
Run Code Online (Sandbox Code Playgroud)
这里的问题是日期增加一天并且时间消失。这里值得一提的是,data.table::fread()
以及readr::read_csv()
正确阅读它,例如,
library(readr)
read_csv("ta_sample.csv")
# A tibble: 5 x 2
BILL_DT AMOUNT
<dttm> <dbl>
1 2015-07-27 18:30:00 …
Run Code Online (Sandbox Code Playgroud)