将具有不同列长度的数据帧重新整形为两列,复制列ID

ALS*_*yer 6 r multiple-columns reshape

我有以下数据框,具有不同的行长度:

myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
                             c("Walter","NA","NA","NA","NA"),
                             c("Walter","Jesse","NA","NA","NA"),
                             c("Gus","Tuco","Mike","NA","NA"), 
                             c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))   
data.frame(ID,myvar)

ID     V1    V2   V3   V4    V5
 1 Walter    NA   NA   NA    NA
 2 Walter    NA   NA   NA    NA
 3 Walter Jesse   NA   NA    NA
 4    Gus  Tuco Mike   NA    NA
 5    Gus  Mike Hank Saul Flynn
Run Code Online (Sandbox Code Playgroud)

我的目标是将此数据帧切换为两列数据帧.第一列是ID,另一列是字符名称.请注意,ID必须与字符最初放置的行相对应.我期待以下结果:

ID      V
1  Walter    
2  Walter
3  Walter
3  Jesse
4  Gus
4  Tuco
4  Mike
5  Gus
5  Mike
5  Hank
5  Saul
5  Flynn
Run Code Online (Sandbox Code Playgroud)

我试过dcast {reshape2},但它没有返回我需要的东西.值得注意的是,我的原始数据框架相当大.有小费吗?干杯.

raw*_*awr 7

myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
                             c("Walter","NA","NA","NA","NA"),
                             c("Walter","Jesse","NA","NA","NA"),
                             c("Gus","Tuco","Mike","NA","NA"), 
                             c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))   
df <- data.frame(ID, myvar)
Run Code Online (Sandbox Code Playgroud)

使用基础重塑.(我正在转换你可能不需要做的"NA"字符串NA,这只是因为你创建了这个例子)

df[df == 'NA'] <- NA
na.omit(reshape(df, direction = 'long', varying = list(2:6))[, c('ID','V1')])

#     ID     V1
# 1.1  1 Walter
# 2.1  2 Walter
# 3.1  3 Walter
# 4.1  4    Gus
# 5.1  5    Gus
# 3.2  3  Jesse
# 4.2  4   Tuco
# 5.2  5   Mike
# 4.3  4   Mike
# 5.3  5   Hank
# 5.4  5   Saul
# 5.5  5  Flynn
Run Code Online (Sandbox Code Playgroud)

或使用 reshape2

library('reshape2')
## na.omit(melt(df, id.vars = 'ID')[, c('ID','value')])

## or better yet as ananda suggests:
melt(df, id.vars = 'ID', na.rm = TRUE)[, c('ID','value')]

#    ID  value
# 1   1 Walter
# 2   2 Walter
# 3   3 Walter
# 4   4    Gus
# 5   5    Gus
# 8   3  Jesse
# 9   4   Tuco
# 10  5   Mike
# 14  4   Mike
# 15  5   Hank
# 20  5   Saul
# 25  5  Flynn
Run Code Online (Sandbox Code Playgroud)

你会得到警告,列上的因子水平不一样,但没关系.


akr*_*run 7

你可以用 unlist

 res <- subset(data.frame(ID,value=unlist(myvar[-1], 
                              use.names=FALSE)), value!='NA')
 res
 #   ID  value
 #1   1 Walter
 #2   2 Walter
 #3   3 Walter
 #4   4    Gus
 #5   5    Gus
 #6   3  Jesse
 #7   4   Tuco
 #8   5   Mike
 #9   4   Mike
 #10  5   Hank
 #11  5   Saul
 #12  5  Flynn
Run Code Online (Sandbox Code Playgroud)

注:NAs是数据集中的"字符"的元素,最好是不带引号来创建它这样,这将是实际的NAS,我们可以将其删除na.omit,is.na,complete.cases等.

数据

myvar <- data.frame(ID,myvar)
Run Code Online (Sandbox Code Playgroud)


the*_*ail 6

修复你"NA"的实际上他们是NA第一个:

mydf[mydf == "NA"] <- NA
Run Code Online (Sandbox Code Playgroud)

使用一些子集来一次性完成所有操作:

data.frame(ID=mydf$ID[row(mydf[-1])[!is.na(mydf[-1])]], V=mydf[-1][!is.na(mydf[-1])])

#   ID      V
#1   1 Walter
#2   2 Walter
#3   3 Walter
#4   4    Gus
#5   5    Gus
#6   3  Jesse
#7   4   Tuco
#8   5   Mike
#9   4   Mike
#10  5   Hank
#11  5   Saul
#12  5  Flynn
Run Code Online (Sandbox Code Playgroud)

或者在基础R中更具可读性:

sel <- which(!is.na(mydf[-1]), arr.ind=TRUE)
data.frame(ID=mydf$ID[sel[,1]], V=mydf[-1][sel])
Run Code Online (Sandbox Code Playgroud)


Ale*_*lex 5

运用 tidyr

library("tidyr")

myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
                             c("Walter","NA","NA","NA","NA"),
                             c("Walter","Jesse","NA","NA","NA"),
                             c("Gus","Tuco","Mike","NA","NA"), 
                             c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))   

myvar <- data.frame(ID,myvar)

myvar %>% 
    gather(ID, Name, V1:V5 ) %>%
    select(ID, value) %>%
    filter(value != "NA")
Run Code Online (Sandbox Code Playgroud)

如果您的NA被编码为NA而不是"NA",那么我们实际上可以使用该na.rm = TRUE选项gather.例如:

myvar[myvar == "NA"] <- NA
myvar %>% 
    gather(ID, Name, V1:V5, na.rm = TRUE ) %>%
    select(ID, value)
Run Code Online (Sandbox Code Playgroud)

   ID  value
1   1 Walter
2   2 Walter
3   3 Walter
4   4    Gus
5   5    Gus
6   3  Jesse
7   4   Tuco
8   5   Mike
9   4   Mike
10  5   Hank
11  5   Saul
12  5  Flynn
Run Code Online (Sandbox Code Playgroud)