Chr*_*ris 7 r vector dataframe
在R中,我有两个对的向量,如下所示:
x <- c("A=5", "B=1", "D=1", "E=1", "F=2", "G=1")
y <- c("A=2", "B=1", "C=3", "D=1", "H=4")
Run Code Online (Sandbox Code Playgroud)
我想将其转换为data.frame,如下所示:
A B C D E F G H
x 5 1 0 1 1 2 1 0
y 2 1 3 1 0 0 0 4
Run Code Online (Sandbox Code Playgroud)
包含在x或y中的所有键都应构成列,未出现在x或y中的键应添加值为零.
这是一种基于环境的方法.创建评估name = val对的单独环境.合并它们:
xe <- new.env()
ye <- new.env()
with(xe, eval(parse(text=x)))
with(ye, eval(parse(text=y)))
# > ls(env=ye)
# [1] "A" "B" "C" "D" "H"
# edit as. list makes even more compact!
df1 <- merge(as.list(xe), as.list(ye), all=TRUE, sort=FALSE)
# sort keeps row order with x on top!
A B D E F G C H
1 5 1 1 1 2 1 NA NA
2 2 1 1 NA NA NA 3 4
df1[is.na(df1)] <- 0
df1
A B D E F G C H
1 2 1 1 0 0 0 3 4
2 5 1 1 1 2 1 0 0
Run Code Online (Sandbox Code Playgroud)
使用reshape :: rbind.fill方法解决了两个参数相等导致丢失一行的问题.
df1 <- rbind.fill(as.data.frame(as.list(xe)), as.data.frame(as.list(ye)) )
Run Code Online (Sandbox Code Playgroud)
不是最漂亮的解决方案,但很容易遵循:
1)将字符串解析为数据框:
df1 <- as.data.frame(sapply(strsplit(x, '='), rbind), stringsAsFactors=FALSE)
Run Code Online (Sandbox Code Playgroud)
结果:
> as.data.frame(sapply(strsplit(x, '='), rbind), stringsAsFactors=FALSE)
V1 V2 V3 V4 V5 V6
1 A B D E F G
2 5 1 1 1 2 1
Run Code Online (Sandbox Code Playgroud)
2)给标题:
names(df1) <- df1[1,]
df1 <- df1[-1,]
Run Code Online (Sandbox Code Playgroud)
结果:
> df1
A B D E F G
2 5 1 1 1 2 1
Run Code Online (Sandbox Code Playgroud)
3)对你的其他字符串做同样的事情:
df2 <- as.data.frame(sapply(strsplit(y, '='), rbind), stringsAsFactors=FALSE)
names(df2) <- df2[1,]
df2 <- df2[-1,]
Run Code Online (Sandbox Code Playgroud)
4)合并那些:
df <- merge(df1, df2, all=TRUE, sort=TRUE)
Run Code Online (Sandbox Code Playgroud)
结果:
> df
A B D E F G C H
1 2 1 1 <NA> <NA> <NA> 3 4
2 5 1 1 1 2 1 <NA> <NA>
Run Code Online (Sandbox Code Playgroud)
更新:基于评论的上述多功能一体化妆:
> df1 <- as.data.frame(sapply(strsplit(x, '='), rbind), stringsAsFactors=FALSE)
> names(df1) <- df1[1,]
> df1 <- df1[-1,]
>
> df2 <- as.data.frame(sapply(strsplit(y, '='), rbind), stringsAsFactors=FALSE)
> names(df2) <- df2[1,]
> df2 <- df2[-1,]
>
> library(reshape)
> df <- rbind.fill(df1,df2)
> df[is.na(df)] <- 0
> df <- df[, order(names(df))]
> df
A B C D E F G H
1 5 1 0 1 1 2 1 0
2 2 1 3 1 0 0 0 4
Run Code Online (Sandbox Code Playgroud)
这是另一种变体:
x <- c("A=5", "B=1","D=1", "E=1", "F=2", "G=1")
y <- c("A=2", "B=1", "C=3", "D=1","H=4")
# Extract names & values
m <- do.call('cbind', strsplit(x, '='))
xn <- m[1,]
xv <- as.numeric(m[2,])
m <- do.call('cbind', strsplit(y, '='))
yn <- m[1,]
yv <- as.numeric(m[2,])
# Merge names
an <- sort(union(xn,yn))
# Assemble result
r <- matrix(0, 2, length(an), dimnames=list(NULL, an))
r[1,xn] <- xv
r[2,yn] <- yv
# Inspect result:
r
# A B C D E F G H
#[1,] 5 1 0 1 1 2 1 0
#[2,] 2 1 3 1 0 0 0 4
# ...if you want a data.frame instead of a matrix:
as.data.frame(r)
# A B C D E F G H
#1 5 1 0 1 1 2 1 0
#2 2 1 3 1 0 0 0 4
Run Code Online (Sandbox Code Playgroud)