我有一个数据集如下:
col1 col2
a 1,2,3
b ["1","2"]
c 4
Run Code Online (Sandbox Code Playgroud)
我希望输出为:
col1 col2
a 1
a 2
a 3
b 1
b 2
c 4
Run Code Online (Sandbox Code Playgroud)
在R中可以这样做吗?如果有,怎么样?
A5C*_*2T1 11
你可以尝试cSplit我的"splitstackshape"包:
library(splitstackshape)
cSplit(as.data.table(mydf)[, col2 := gsub("[][\"]", "", col2)],
"col2", ",", "long")
# col1 col2
# 1: a 1
# 2: a 2
# 3: a 3
# 4: b 1
# 5: b 2
# 6: c 4
Run Code Online (Sandbox Code Playgroud)
当然,我非常偏爱cSplit,但你也可以使用"dplyr"和unnest"tidyr":
library(dplyr)
library(tidyr)
mydf %>%
mutate(col2 = strsplit(gsub("[][\"]", "", col2), ",")) %>%
unnest(col2)
Run Code Online (Sandbox Code Playgroud)
或者只是使用"data.table":
library(data.table)
as.data.table(mydf)[, list(
col2 = unlist(strsplit(gsub("[][\"]", "", col2), ","))),
by = col1]
Run Code Online (Sandbox Code Playgroud)
The separate_rows() function in tidyr is the boss for observations with multiple delimited values. As you have a mix of integer and character strings (but just want integers in the final result, set convert = TRUE and use the drop_na() (also in tidyr) to filter out the new rows for where the square parenthesis would otherwise go.
# create data
library(tidyverse)
d <- data_frame(
col1 = c("a", "b", "c"),
col2 = c("1,2,3", "[\"1\",\"2\"]", 4)
)
d
# # A tibble: 3 x 2
# col1 col2
# <chr> <chr>
# 1 a 1,2,3
# 2 b "[\"1\",\"2\"]"
# 3 c 4
# tidy data
d %>%
separate_rows(col2, convert = TRUE) %>%
drop_na()
# # A tibble: 6 x 2
# col1 col2
# <chr> <int>
# 1 a 1
# 2 a 2
# 3 a 3
# 4 b 1
# 5 b 2
# 6 c 4
Run Code Online (Sandbox Code Playgroud)