Eri*_*ail 4 r strsplit dataframe
我再次与strsplit斗争.我正在将一些字符串转换为数据帧,但是有一个正斜杠,/并且我的字符串中的一些空格会让我烦恼.我可以解决它,但我渴望了解我是否可以使用某些花哨或strsplit.我下面的工作示例应说明问题
我正在使用的strsplit函数
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "\\s+")[[x]])) }
Run Code Online (Sandbox Code Playgroud)
我得到的一种字符串,
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
str_to_df(string1)
#> [,1] [,2]
#> [1,] "One" "58/2"
#> [2,] "Two" "22/3"
#> [3,] "Three" "15/5"
Run Code Online (Sandbox Code Playgroud)
另一种类型我在同一个地方,
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string2)
#> [,1] [,2] [,3] [,4]
#> [1,] "One" "58" "/" "2"
#> [2,] "Two" "22" "/" "3"
#> [3,] "Three" "15" "/" "5"
Run Code Online (Sandbox Code Playgroud)
它们显然创建了不同的输出,我无法弄清楚如何编写适用于两者的解决方案.以下是我想要的结果.先感谢您!
desired_outcome <- structure(c("One", "Two", "Three", "58", "22",
"15", "2", "3", "5"), .Dim = c(3L, 3L))
desired_outcome
#> [,1] [,2] [,3]
#> [1,] "One" "58" "2"
#> [2,] "Two" "22" "3"
#> [3,] "Three" "15" "5"
Run Code Online (Sandbox Code Playgroud)
这有效:
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "[/[:space:]]+")[[x]])) }
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string1)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
str_to_df(string2)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
Run Code Online (Sandbox Code Playgroud)
另一种方法tidyr可能是:
string1 %>%
as_tibble() %>%
separate(value, into = c("Col1", "Col2", "Col3"), sep = "[/[:space:]]+")
# A tibble: 3 x 3
# Col1 Col2 Col3
# <chr> <chr> <chr>
# 1 One 58 2
# 2 Two 22 3
# 3 Three 15 5
Run Code Online (Sandbox Code Playgroud)
我们可以split在一个或多个空格或制表符或正斜杠上创建一个函数
f1 <- function(str1) do.call(rbind, strsplit(str1, "[/\t ]+"))
f1(string1)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
f1(string2)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
Run Code Online (Sandbox Code Playgroud)
或者我们可以read.csv在用公共分隔符替换空格后使用
read.csv(text=gsub("[\t/ ]+", ",", string1), header = FALSE)
# V1 V2 V3
#1 One 58 2
#2 Two 22 3
#3 Three 15 5
Run Code Online (Sandbox Code Playgroud)