如何在新行中分隔R中的逗号分隔值？

Question

如何在新行中分隔R中的逗号分隔值？

我有一个数据集如下:

col1    col2
a        1,2,3
b        ["1","2"]
c        4

Run Code Online (Sandbox Code Playgroud)

我希望输出为:

col1     col2
a         1
a         2
a         3
b         1
b         2
c         4

Run Code Online (Sandbox Code Playgroud)

在R中可以这样做吗？如果有,怎么样？

Answer 1

A5C*_*2T1 11

你可以尝试cSplit我的"splitstackshape"包:

library(splitstackshape)
cSplit(as.data.table(mydf)[, col2 := gsub("[][\"]", "", col2)], 
       "col2", ",", "long")
#    col1 col2
# 1:    a    1
# 2:    a    2
# 3:    a    3
# 4:    b    1
# 5:    b    2
# 6:    c    4

Run Code Online (Sandbox Code Playgroud)

当然,我非常偏爱cSplit,但你也可以使用"dplyr"和unnest"tidyr":

library(dplyr)
library(tidyr)

mydf %>%
  mutate(col2 = strsplit(gsub("[][\"]", "", col2), ",")) %>%
  unnest(col2)

Run Code Online (Sandbox Code Playgroud)

或者只是使用"data.table":

library(data.table)
as.data.table(mydf)[, list(
  col2 = unlist(strsplit(gsub("[][\"]", "", col2), ","))), 
  by = col1]

Run Code Online (Sandbox Code Playgroud)

Answer 2

gja*_*bel 5

The separate_rows() function in tidyr is the boss for observations with multiple delimited values. As you have a mix of integer and character strings (but just want integers in the final result, set convert = TRUE and use the drop_na() (also in tidyr) to filter out the new rows for where the square parenthesis would otherwise go.

# create data 
library(tidyverse)
d <- data_frame(
  col1 = c("a", "b", "c"), 
  col2 = c("1,2,3", "[\"1\",\"2\"]", 4)
)
d
# # A tibble: 3 x 2
#    col1            col2
#   <chr>           <chr>
# 1     a           1,2,3
# 2     b "[\"1\",\"2\"]"
# 3     c               4

# tidy data
d %>%
  separate_rows(col2, convert = TRUE) %>%
  drop_na()
# # A tibble: 6 x 2
#    col1  col2
#   <chr> <int>
# 1     a     1
# 2     a     2
# 3     a     3
# 4     b     1
# 5     b     2
# 6     c     4

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，6 月前
查看次数：	1350 次
最近记录：	6 年，3 月前