按数据表上的最后一个空格拆分字符串

Die*_*ego 3 string split r data.table

我有一个包含2列的数据表:

             term  freq
1:    a arena tour    1
2: a available why    1
3:     a backup in    1
4:       a bad ass    1
5:     a bad chick    1
Run Code Online (Sandbox Code Playgroud)

我想将"term"列拆分为最后一个空格,例如:

         termA  termB freq
1:     a arena   tour    1
2: a available    why    1
3:    a backup     in    1
4:       a bad  chick    1
Run Code Online (Sandbox Code Playgroud)

我尝试使用"str"(代码bellow),它仅对一个字符串起作用,但对data.date不起作用(似乎在所有行上使用相同的索引)

data.table (termA = substr(dt_n3$term, 1, rev(gregexpr("\\ ", dt_n3$term)[[1]])[1]-1),
                         termB = substr(dt_n3$term, rev(gregexpr("\\ ", dt_n3$term)[[1]])[1], 1000),
                         freq = dt_n3$freq)
Run Code Online (Sandbox Code Playgroud)

无论如何,我不认为这是最好的方法.有人可以帮我吗?谢谢

akr*_*run 7

您可以尝试v 1.9.5中tstrsplit功能data.table

DT[, paste0('term', LETTERS[1:2]) := tstrsplit(term, ' (?=[^ ]*$)',
                                     perl=TRUE)][, term:=NULL][]
#   freq       termA termB
#1:    1     a arena  tour
#2:    1 a available   why
#3:    1    a backup    in
#4:    1       a bad   ass
#5:    1       a bad chick
Run Code Online (Sandbox Code Playgroud)

数据

DT <- data.table(term= c("a arena tour", "a available why", 
      "a backup in", "a bad ass", "a bad chick"), freq=1)
Run Code Online (Sandbox Code Playgroud)

稍微修改过的版本,您可以在同一语句中进行分配和删除:

cols = c("term", paste0("term", LETTERS[1:2]))
DT[, (cols) := c(list(NULL), tstrsplit(term, ' (?=[^ ]*$)', perl=TRUE))]
Run Code Online (Sandbox Code Playgroud)

指定NULLterm删除该列.