Nic*_*123 5 r dplyr data.table
我有这个示例数据集,我想将其转换为以下格式:
Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5,1,2,3,1,2,2.5)
df_before <- data.frame(Type, Level, Estimate)
Type Level Estimate
1 AGE 18-25 1.5
2 AGE 26-70 1.0
3 REGION London 2.0
4 REGION Southampton 3.0
5 REGION Newcastle 1.0
6 DRIVERS 1 2.0
7 DRIVERS 2 2.5
Run Code Online (Sandbox Code Playgroud)
基本上,我想将数据集转换为以下格式。我已经尝试过该功能dcast(),但似乎不起作用。
AGE Estimate_AGE REGION Estimate_REGION DRIVERS Estimate_DRIVERS
1 18-25 1.5 London 2 1 2.0
2 26-70 1.0 Southampton 3 2 2.5
3 <NA> NA Newcastle 1 <NA> NA
Run Code Online (Sandbox Code Playgroud)
df_before %>%
group_by(Type) %>%
mutate(id = row_number(), Estimate = as.character(Estimate))%>%
pivot_longer(-c(Type, id)) %>%
pivot_wider(id, names_from = c(Type, name))%>%
type.convert(as.is = TRUE)
# A tibble: 3 x 7
id AGE_Level AGE_Estimate REGION_Level REGION_Estimate DRIVERS_Level DRIVERS_Estimate
<int> <chr> <dbl> <chr> <int> <int> <dbl>
1 1 18-25 1.5 London 2 1 2
2 2 26-70 1 Southampton 3 2 2.5
3 3 NA NA Newcastle 1 NA NA
Run Code Online (Sandbox Code Playgroud)
在数据表中:
library(data.table)
setDT(df_before)
dcast(melt(df_before, 'Type'), rowid(Type, variable)~Type + variable)
Run Code Online (Sandbox Code Playgroud)
请注意,由于类型不匹配,您会收到很多警告。你可以用reshape2::melt它来避免这种情况。
无论如何,您的数据帧不是标准格式。
基数 R >=4.0
transform(df_before, id = ave(Estimate, Type, FUN = seq_along)) |>
reshape(v.names = c('Level', 'Estimate'), dir = 'wide', timevar = 'Type', sep = "_")
id Level_AGE Estimate_AGE Level_REGION Estimate_REGION Level_DRIVERS Estimate_DRIVERS
1 1 18-25 1.5 London 2 1 2.0
2 2 26-70 1.0 Southampton 3 2 2.5
5 3 <NA> NA Newcastle 1 <NA> NA
Run Code Online (Sandbox Code Playgroud)
IN 基数 R <4
reshape(transform(df_before, id = ave(Estimate, Type, FUN = seq_along)),
v.names = c('Level', 'Estimate'), dir = 'wide', timevar = 'Type', sep = "_")
Run Code Online (Sandbox Code Playgroud)