我知道包中的spread功能,tidyr但这是我无法实现的.我有一个data.frame有2列,如下所示.我需要将列Subject转换为带有1和0的二进制列.
下面是data.frame
studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3),
Subject = c("Maths", "Science", "English", "Maths", "History", "History"))
> studentInfo
StudentID Subject
1 1 Maths
2 1 Science
3 1 English
4 2 Maths
5 3 History
6 3 History
Run Code Online (Sandbox Code Playgroud)
我期待的输出是:
StudentID Maths Science English History
1 1 1 1 1 0
2 2 1 0 0 0
3 3 0 0 0 1
Run Code Online (Sandbox Code Playgroud)
请使用"传播"功能或任何其他功能协助如何执行此操作.谢谢
Sym*_*xAU 10
使用reshape2我们可以dcast从长到宽.
由于您只需要二进制结果,我们可以先unique获取数据
library(reshape2)
si <- unique(studentInfo)
dcast(si, formula = StudentID ~ Subject, fun.aggregate = length)
# StudentID English History Maths Science
#1 1 1 0 1 1
#2 2 0 0 1 0
#3 3 0 1 0 0
Run Code Online (Sandbox Code Playgroud)
使用tidyr和的另一种方法dplyr是
library(tidyr)
library(dplyr)
studentInfo %>%
mutate(yesno = 1) %>%
distinct %>%
spread(Subject, yesno, fill = 0)
# StudentID English History Maths Science
#1 1 1 0 1 1
#2 2 0 0 1 0
#3 3 0 1 0 0
Run Code Online (Sandbox Code Playgroud)
虽然我不是tidyr语法的粉丝......
我们可以使用table从base R
+(table(studentInfo)!=0)
# Subject
#StudentID English History Maths Science
# 1 1 0 1 1
# 2 0 0 1 0
# 3 0 1 0 0
Run Code Online (Sandbox Code Playgroud)
使用tidyr:
library(tidyr)
studentInfo <- data.frame(
StudentID = c(1,1,1,2,3,3),
Subject = c("Maths", "Science", "English", "Maths", "History", "History"))
pivot_wider(studentInfo,
names_from = "Subject",
values_from = 'Subject',
values_fill = 0,
values_fn = function(x) 1)
#> # A tibble: 3 x 5
#> StudentID Maths Science English History
#> <dbl> <int> <int> <int> <int>
#> 1 1 1 1 1 0
#> 2 2 1 0 0 0
#> 3 3 0 0 0 1
Run Code Online (Sandbox Code Playgroud)
由reprex 包(v0.3.0)于 2019 年 9 月 19 日创建