从多个数据帧列到一个向量的值

Question

从多个数据帧列到一个向量的值

我有一个数据框df,有很多cols,说100行.

如何从名称为"alpha","gamma"和"zeta"的列中获取所有级别值,并将其中的300个存储在单个向量中？

Answer 1

我发现首先转换为矩阵可以更容易地达到水平.

as.vector(as.matrix(df[,c("alpha", "gamma", "zeta")]))

Run Code Online (Sandbox Code Playgroud)

当然,您stringsAsFactors=FALSE最初可以在读取数据时完成.

Answer 2

A5C*_*2T1 6

factor您有一个可接受的答案，但我认为正在发生的事情是这样的：您有和列的组合character。在这种情况下，unlist不能直接工作，但如果它们是 allfactor或如果它们是 all character，就不会有问题：

一些示例数据：

mydf <- data.frame(A = LETTERS[1:3], B = LETTERS[4:6], C = LETTERS[7:9],
                   D = LETTERS[10:12], E = LETTERS[13:15])
df <- mydf
df$E <- as.character(df$E)
colsOfInterest <- c("A", "B", "E")

Run Code Online (Sandbox Code Playgroud)

情况1，所有列都是因子

unlist(mydf[colsOfInterest], use.names = FALSE)
# [1] A B C D E F M N O
# Levels: A B C D E F M N O

Run Code Online (Sandbox Code Playgroud)

情况2，E列=字符，其他列因素

unlist(df[colsOfInterest], use.names = FALSE)
# [1] "1" "2" "3" "1" "2" "3" "M" "N" "O"

unlist(lapply(df[colsOfInterest], as.character), use.names = FALSE)
# [1] "A" "B" "C" "D" "E" "F" "M" "N" "O"

Run Code Online (Sandbox Code Playgroud)

对于此处描述的规模问题，基准测试表明，unlist如果您不关心保留因素，那么首先转换为字符并使用实际上是最快的方法。请注意，如果某些列是因子而某些列是字符，则结果fun1()将不正确。这是 100 行的基准data.frame：

library(microbenchmark)    
microbenchmark(fun1(), fun2(), fun3())
# Unit: microseconds
#    expr      min        lq    median       uq      max neval
#  fun1()  572.606  587.3595  595.4845  606.175 3439.055   100
#  fun2()  327.570  334.6265  341.2550  350.449 3443.758   100
#  fun3() 1037.020 1055.6215 1064.1745 1086.197 3929.981   100

Run Code Online (Sandbox Code Playgroud)

当然，这里我们谈论的是微秒，但结果也是可扩展的。

作为参考，以下是用于基准测试的内容。如果您想测试不同大小的提取不同数量的列，请更改“ nRow”和“ ” 。nColdata.frame

nRow <- 100
nCol <- 30
set.seed(1)
mydf <- data.frame(matrix(sample(LETTERS, nRow*nCol, replace = TRUE), nrow = nRow))
colsOfInterest <- sample(nCol, sample(nCol*.7, 1))
length(colsOfInterest)
# [1] 17

library(microbenchmark)    
fun1 <- function() unlist(mydf[colsOfInterest], use.names = FALSE)
fun2 <- function() unlist(lapply(mydf[colsOfInterest], as.character), use.names = FALSE)
fun3 <- function() as.vector(as.matrix(mydf[colsOfInterest]))
microbenchmark(fun1(), fun2(), fun3())

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，1 月前
查看次数：	19264 次
最近记录：	12 年，1 月前