取消列出来自其他列的数据框列保留信息

Cpt*_*emo 15 r list dataframe

我有一个数据框,由两列组成:字符向量col1listcol2.

myVector <- c("A","B","C","D")

myList <- list()
myList[[1]] <- c(1, 4, 6, 7)
myList[[2]] <- c(2, 7, 3)
myList[[3]] <- c(5, 5, 3, 9, 6)
myList[[4]] <- c(7, 9)

myDataFrame <- data.frame(row = c(1,2,3,4))

myDataFrame$col1 <- myVector
myDataFrame$col2 <- myList

myDataFrame
# row col1          col2
# 1   1    A    1, 4, 6, 7
# 2   2    B       2, 7, 3
# 3   3    C 5, 5, 3, 9, 6
# 4   4    D          7, 9
Run Code Online (Sandbox Code Playgroud)

我想取消我col2仍然保留列表中向量的每个元素存储的信息col1.换句话说,在常用的数据框重构术语中:"宽"列表列应转换为"长"格式.

然后在一天结束时,我想要两个长度等于的向量length(unlist(myDataFrame$col2)).在代码中:

# unlist myList
unlist.col2 <- unlist(myDataFrame$col2)
unlist.col2
# [1] 1 4 6 7 2 7 3 5 5 3 9 6 7 9

# unlist myVector to obtain
# unlist.col1 <- ???
# unlist.col1
# [1] A A A A B B B C C C C C D D
Run Code Online (Sandbox Code Playgroud)

我想不出任何直截了当的方式来获得它.

Hen*_*rik 23

您也可以使用unnest包装tidyr:

library(tidyr)
unnest(myDataFrame, col2)

#      row  col1  col2
#    (dbl) (chr) (dbl)
# 1      1     A     1
# 2      1     A     4
# 3      1     A     6
# 4      1     A     7
# 5      2     B     2
# 6      2     B     7
# 7      2     B     3
# 8      3     C     5
# 9      3     C     5
# 10     3     C     3
# 11     3     C     9
# 12     3     C     6
# 13     4     D     7
# 14     4     D     9
Run Code Online (Sandbox Code Playgroud)


A5C*_*2T1 5

您可以使用"data.table"扩展整个data.frame,并提取感兴趣的列.

library(data.table)
## expand the entire data.frame (uncomment to see)
# as.data.table(myDataFrame)[, unlist(col2), by = list(row, col1)]

## expand and select the column of interest:
as.data.table(myDataFrame)[, unlist(col2), by = list(row, col1)]$col1
#  [1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Run Code Online (Sandbox Code Playgroud)

在较新版本的R中,您现在可以使用该lengths功能而不是sapply(list, length)方法.该lengths功能是相当快的.

with(myDataFrame, rep(col1, lengths(col2)))
#  [1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Run Code Online (Sandbox Code Playgroud)


akr*_*run 4

这里的想法是首先使用获取每个列表元素的长度sapply,然后使用它rep来复制col1length

 l1 <- sapply(myDataFrame$col2, length)
  unlist.col1 <- rep(myDataFrame$col1, l1)
  unlist.col1
 #[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Run Code Online (Sandbox Code Playgroud)

或者按照 @Ananda Mahto 的建议,上述操作也可以通过vapply

   with(myDataFrame, rep(col1, vapply(col2, length, 1L)))
  #[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Run Code Online (Sandbox Code Playgroud)