我正在尝试将当前数据集的格式更改为每行 1 个用户的格式,并将“颜色”和“食物”列中的所有唯一值(动态数量的值)拆分为各自包含“是”和“否”的列。用户有一个唯一的ID。
Current format:
ID | Name | Color | Food
1 | John | Blue | Pizza
1 | John | Red | Pizza
1 | John | Yellow | Pizza
1 | John | Blue | Ice Cream
1 | John | Red | Ice Cream
1 | John | Yellow | Ice Cream
2 | Kelly | Blue | Pizza
2 | Kelly | Red | Pizza
Desired format:
ID | Name | Color_Blue | Color_Red | Color_Yellow | Food_Pizza | Food_Ice Cream |
1 | John | Yes | Yes | Yes | Yes | Yes |
2 | Kelly | Yes | Yes | No | Yes | No |
Run Code Online (Sandbox Code Playgroud)
library(dplyr); library(tidyr)
df %>%
pivot_longer(-c(ID:Name)) %>%
unite("col", c(name, value)) %>%
distinct(ID, Name, col) %>%
mutate(val = "Yes") %>%
pivot_wider(names_from = col, values_from = "val", values_fill = "No")
# A tibble: 2 x 7
ID Name Color_Blue Food_Pizza Color_Red Color_Yellow `Food_Ice Cream`
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 John Yes Yes Yes Yes Yes
2 2 Kelly Yes Yes Yes No No
Run Code Online (Sandbox Code Playgroud)
如果您想要基本 R 等效项,这里有一个使用相同步骤的方法。(有人可以帮我弄清楚如何删除行名称和附加到最终列名称的“val。”吗?)
df2 <- reshape(df,
direction = "long",
varying = c("Color", "Food"),
v.names = "Value",
timevar = "col_name",
times = c("Color", "Food"))
df2$col = paste(df2$col_name, df2$Value, sep = "_")
df3 <- unique(df2[c("ID", "Name", "col")])
df3$val = "Yes"
df4 <- reshape(df3,
direction = "wide",
idvar = c("ID", "Name"),
timevar = "col")
df4[is.na(df4)] <- "No"
> df4
ID Name val.Color_Blue val.Color_Red val.Color_Yellow val.Food_Pizza val.Food_Ice Cream
1.Color 1 John Yes Yes Yes Yes Yes
7.Color 2 Kelly Yes Yes No Yes No
Run Code Online (Sandbox Code Playgroud)
样本数据
df <- tribble(~ID , ~Name , ~Color , ~Food,
"1" , "John", "Blue", "Pizza",
"1" , "John" , "Red", "Pizza",
"1" , "John", "Yellow", "Pizza",
"1" , "John" , "Blue", "Ice Cream",
"1" , "John", "Red", "Ice Cream",
"1" , "John" , "Yellow", "Ice Cream",
"2" , "Kelly", "Blue", "Pizza",
"2" , "Kelly", "Red", "Pizza")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1208 次 |
| 最近记录: |