基于由 (,) 连接并以空格分隔的字符向量对创建数据帧

And*_*eas 2 r dataframe data.table

我有以下 data.frame:

b<-structure(list(b = c("47.83006,11.71699 47.83004,11.71691 47.83002,11.7168 47.83001,11.71662", 
"47.83001,11.71662 47.82993,11.71628 47.82991,11.7162 47.82988,11.71614 47.82983,11.71609 47.8295,11.71588 47.82919,11.71566 47.82898,11.71549 47.82845,11.71504 47.82832,11.715 47.82821,11.715 47.82712,11.71531 47.82639,11.71549 47.82606,11.71561 47.8257,11.71567 47.82548,11.71574 47.82433,11.71613", 
"47.82433,11.71613 47.82436,11.7165 47.8244,11.71715 47.82442,11.71742 47.82453,11.71823 47.82459,11.71856 47.82492,11.7199", 
"47.82492,11.7199 47.82495,11.72005 47.82503,11.72034 47.82515,11.72066 47.82526,11.72093 47.82556,11.72172 47.82559,11.72182 47.82561,11.72191 47.82562,11.72201", 
"47.85051,12.11965 47.85092,12.11997", "48.10034,11.75948 48.10021,11.75938"
)), row.names = c(NA, 6L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

在此处输入图片说明

它由由空格分隔的坐标 lat,lon 对组成。

我怎样才能从这个结构中尽可能高效地创建一个 data.frame 或 data.table,将 lat 和 lon 值放在不同的行中?

Lat       lon
47.83006  11.71699
47.83004  11.71691
47.83002  11.7168
…
Run Code Online (Sandbox Code Playgroud)

更新 感谢您的解决方案。我会选择@Gki 提案,因为它更快:

Unit: milliseconds
                                                                                                                         expr
 c <- b %>% separate_rows(b, sep = " ") %>% separate(b, into = c("Lat",      "Lon"), sep = ",", convert = T) %>% data.frame()
                                     d <- read.csv(text = unlist(strsplit(b$b, " ", TRUE)), col.names = c("Lat",      "Lon"))
       min        lq      mean    median        uq       max neval
 12.363628 13.031700 14.027860 13.408883 13.703157 28.922909   100
  1.020622  1.050315  1.119533  1.117269  1.170826  1.348833   100
Run Code Online (Sandbox Code Playgroud)

GKi*_*GKi 6

您可以使用strsplit来按值之间的空间进行拆分,然后使用read.csv来获取data.frame.

read.csv(text=unlist(strsplit(b$b, " ", TRUE)), col.names = c("Lat", "Lon"))
#        Lat      Lon
#1  47.83004 11.71691
#2  47.83002 11.71680
#3  47.83001 11.71662
#4  47.83001 11.71662
#5  47.82993 11.71628
#6  47.82991 11.71620
#7  47.82988 11.71614
#...
Run Code Online (Sandbox Code Playgroud)

或者从R 4.1.0 开始base 中使用Forward Pipe Operator |>函数快捷方式:\()

strsplit(b$b, " ", TRUE) |> unlist() |> (\(d) read.csv(text=d, col.names = c("Lat", "Lon")))()
#        Lat      Lon
#1  47.83004 11.71691
#2  47.83002 11.71680
#3  47.83001 11.71662
#...
Run Code Online (Sandbox Code Playgroud)

或者使用奇异的管道 ->.;而不是定义一个函数:

strsplit(b$b, " ", TRUE) |> unlist() ->.; read.csv(text=., col.names = c("Lat", "Lon"))
Run Code Online (Sandbox Code Playgroud)

跳过设置列标题时,转换为数字并生成矩阵的快速方法是:

do.call(rbind, strsplit(unlist(strsplit(b$b, " ", TRUE)), ",", TRUE))
Run Code Online (Sandbox Code Playgroud)

或将其转换为数字:

matrix(as.numeric(unlist(strsplit(unlist(strsplit(b$b, " ", TRUE)), ",", TRUE))), ncol=2, byrow=TRUE)
Run Code Online (Sandbox Code Playgroud)

使用data.table@mt1022的解决方案进行比较:

library(data.table)
microbenchmark::microbenchmark(
  base = do.call(rbind, strsplit(unlist(strsplit(b$b, " ", TRUE)), ",", TRUE))
, baseNum = matrix(as.numeric(unlist(strsplit(unlist(strsplit(b$b, " ", TRUE)), ",", TRUE))), ncol=2, byrow=TRUE)
, data.table = as.data.table(tstrsplit(unlist(strsplit(b$b, ' ', T)), ',', T))
)
#Unit: microseconds
#       expr     min       lq      mean   median       uq     max neval cld
#       base  28.829  30.2965  33.08313  31.5705  33.0475  85.880   100  a 
#    baseNum  29.832  31.3030  33.51445  32.3635  34.5395  56.851   100  a 
# data.table 143.745 147.9900 155.41194 150.9960 157.2420 278.190   100   b
Run Code Online (Sandbox Code Playgroud)