r 箭头将所有列的列类型/架构设置为 char

Question

r 箭头将所有列的列类型/架构设置为 char

在打开大型 csv 文件时，{arrow} 的列类型自动检测给我带来了一些麻烦。特别是，它会删除某些标识符的前导零，并执行其他一些不幸的操作。由于数据集相当宽（几百列）并且我不想手动设置所有架构值，因此我想以某种方式以编程方式设置它。

一个好的开始是在使用 . 打开数据集时将所有arrow::open_dataset列转换为字符。或者更正datase_connection$schema特定列的现有对象。

但是，我不知道该怎么做。

Answer 1

使用时，arrow::open_dataset()您可以手动定义一个架构来确定列名称和类型。我在下面粘贴了一个示例，它显示了首先自动检测列名称类型的默认行为，然后使用架构来覆盖它并指定您自己的列名称和类型。此处的示例按照要求以编程方式执行此操作，但您也可以手动定义架构。

library(arrow)

write_dataset(mtcars, "mtcars")

# opens the dataset with column detection
dataset <- open_dataset("mtcars")
dataset
#> FileSystemDataset with 1 Parquet file
#> mpg: double
#> cyl: double
#> disp: double
#> hp: double
#> drat: double
#> wt: double
#> qsec: double
#> vs: double
#> am: double
#> gear: double
#> carb: double
#> 
#> See $metadata for additional Schema metadata

# define new schema automatically
chosen_schema <- schema(
  purrr::map(names(dataset), ~Field$create(name = .x, type = string()))
)

# now opens the dataset with the chosen schema
open_dataset("mtcars", schema = chosen_schema) 
#> FileSystemDataset with 1 Parquet file
#> mpg: string
#> cyl: string
#> disp: string
#> hp: string
#> drat: string
#> wt: string
#> qsec: string
#> vs: string
#> am: string
#> gear: string
#> carb: string

Run Code Online (Sandbox Code Playgroud)

归档时间：	3 年，9 月前
查看次数：	1171 次
最近记录：	3 年，9 月前