ALS*_*yer 6 r multiple-columns reshape
我有以下数据框,具有不同的行长度:
myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
c("Walter","NA","NA","NA","NA"),
c("Walter","Jesse","NA","NA","NA"),
c("Gus","Tuco","Mike","NA","NA"),
c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))
data.frame(ID,myvar)
ID V1 V2 V3 V4 V5
1 Walter NA NA NA NA
2 Walter NA NA NA NA
3 Walter Jesse NA NA NA
4 Gus Tuco Mike NA NA
5 Gus Mike Hank Saul Flynn
Run Code Online (Sandbox Code Playgroud)
我的目标是将此数据帧切换为两列数据帧.第一列是ID,另一列是字符名称.请注意,ID必须与字符最初放置的行相对应.我期待以下结果:
ID V
1 Walter
2 Walter
3 Walter
3 Jesse
4 Gus
4 Tuco
4 Mike
5 Gus
5 Mike
5 Hank
5 Saul
5 Flynn
Run Code Online (Sandbox Code Playgroud)
我试过dcast {reshape2},但它没有返回我需要的东西.值得注意的是,我的原始数据框架相当大.有小费吗?干杯.
myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
c("Walter","NA","NA","NA","NA"),
c("Walter","Jesse","NA","NA","NA"),
c("Gus","Tuco","Mike","NA","NA"),
c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))
df <- data.frame(ID, myvar)
Run Code Online (Sandbox Code Playgroud)
使用基础重塑.(我正在转换你可能不需要做的"NA"字符串NA,这只是因为你创建了这个例子)
df[df == 'NA'] <- NA
na.omit(reshape(df, direction = 'long', varying = list(2:6))[, c('ID','V1')])
# ID V1
# 1.1 1 Walter
# 2.1 2 Walter
# 3.1 3 Walter
# 4.1 4 Gus
# 5.1 5 Gus
# 3.2 3 Jesse
# 4.2 4 Tuco
# 5.2 5 Mike
# 4.3 4 Mike
# 5.3 5 Hank
# 5.4 5 Saul
# 5.5 5 Flynn
Run Code Online (Sandbox Code Playgroud)
或使用 reshape2
library('reshape2')
## na.omit(melt(df, id.vars = 'ID')[, c('ID','value')])
## or better yet as ananda suggests:
melt(df, id.vars = 'ID', na.rm = TRUE)[, c('ID','value')]
# ID value
# 1 1 Walter
# 2 2 Walter
# 3 3 Walter
# 4 4 Gus
# 5 5 Gus
# 8 3 Jesse
# 9 4 Tuco
# 10 5 Mike
# 14 4 Mike
# 15 5 Hank
# 20 5 Saul
# 25 5 Flynn
Run Code Online (Sandbox Code Playgroud)
你会得到警告,列上的因子水平不一样,但没关系.
你可以用 unlist
res <- subset(data.frame(ID,value=unlist(myvar[-1],
use.names=FALSE)), value!='NA')
res
# ID value
#1 1 Walter
#2 2 Walter
#3 3 Walter
#4 4 Gus
#5 5 Gus
#6 3 Jesse
#7 4 Tuco
#8 5 Mike
#9 4 Mike
#10 5 Hank
#11 5 Saul
#12 5 Flynn
Run Code Online (Sandbox Code Playgroud)
注:该NAs是数据集中的"字符"的元素,最好是不带引号来创建它这样,这将是实际的NAS,我们可以将其删除na.omit,is.na,complete.cases等.
myvar <- data.frame(ID,myvar)
Run Code Online (Sandbox Code Playgroud)
修复你"NA"的实际上他们是NA第一个:
mydf[mydf == "NA"] <- NA
Run Code Online (Sandbox Code Playgroud)
使用一些子集来一次性完成所有操作:
data.frame(ID=mydf$ID[row(mydf[-1])[!is.na(mydf[-1])]], V=mydf[-1][!is.na(mydf[-1])])
# ID V
#1 1 Walter
#2 2 Walter
#3 3 Walter
#4 4 Gus
#5 5 Gus
#6 3 Jesse
#7 4 Tuco
#8 5 Mike
#9 4 Mike
#10 5 Hank
#11 5 Saul
#12 5 Flynn
Run Code Online (Sandbox Code Playgroud)
或者在基础R中更具可读性:
sel <- which(!is.na(mydf[-1]), arr.ind=TRUE)
data.frame(ID=mydf$ID[sel[,1]], V=mydf[-1][sel])
Run Code Online (Sandbox Code Playgroud)
运用 tidyr
library("tidyr")
myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
c("Walter","NA","NA","NA","NA"),
c("Walter","Jesse","NA","NA","NA"),
c("Gus","Tuco","Mike","NA","NA"),
c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))
myvar <- data.frame(ID,myvar)
myvar %>%
gather(ID, Name, V1:V5 ) %>%
select(ID, value) %>%
filter(value != "NA")
Run Code Online (Sandbox Code Playgroud)
如果您的NA被编码为NA而不是"NA",那么我们实际上可以使用该na.rm = TRUE选项gather.例如:
myvar[myvar == "NA"] <- NA
myvar %>%
gather(ID, Name, V1:V5, na.rm = TRUE ) %>%
select(ID, value)
Run Code Online (Sandbox Code Playgroud)
给
ID value
1 1 Walter
2 2 Walter
3 3 Walter
4 4 Gus
5 5 Gus
6 3 Jesse
7 4 Tuco
8 5 Mike
9 4 Mike
10 5 Hank
11 5 Saul
12 5 Flynn
Run Code Online (Sandbox Code Playgroud)