R使用tidyr :: separate分割字符串

Question

R使用tidyr :: separate分割字符串

假设我有一个这样的数据帧:

df<-data.frame(a=c("AA","BB"),b=c("short string","this is the longer string"))

Run Code Online (Sandbox Code Playgroud)

我想根据最后出现的空间使用正则表达式拆分每个字符串.我试过了:

library(dplyr)
library(tidyr)
df%>%
  separate(b,c("partA","partB"),sep=" [^ ]*$")

Run Code Online (Sandbox Code Playgroud)

但这省略了输出中字符串的第二部分.我想要的输出看起来像这样:

   a              partA  partB
1 AA              short string
2 BB this is the longer string

Run Code Online (Sandbox Code Playgroud)

我该怎么做呢.如果我可以使用tidyr和dplyr这样会很好.

Answer 1

akr*_*run 16

我们可以使用extract从tidyr使用捕获组((...)).我们匹配零个或多个字符(.*)并将其放在括号((.*))中,然后是零或更多空格(\\s+),然后是下一个捕获组,其中只包含不是空格([^ ])的字符,直到结束($)串.

library(tidyr)
extract(df, b, into = c('partA', 'partB'), '(.*)\\s+([^ ]+)$')
#   a              partA  partB
#1 AA              short string
#2 BB this is the longer string

Run Code Online (Sandbox Code Playgroud)

Answer 2

Wik*_*żew 10

您可以将[^ ]*$正则表达式的一部分转换(?=[^ ]*$)为非消耗模式，即正向前瞻（不会消耗字符串末尾的非空白字符，即它们不会被放入匹配值中，因此将保留在输出中有）：

df%>%
  separate(b,c("partA","partB"),sep=" (?=[^ ]*$)")

Run Code Online (Sandbox Code Playgroud)

或者，更通用一点，因为它匹配任何空白字符：

df %>%
  separate(b,c("partA","partB"),sep="\\s+(?=\\S*$)")

Run Code Online (Sandbox Code Playgroud)

请参阅下面的正则表达式演示及其图表：

输出：

   a              partA  partB
1 AA              short string
2 BB this is the longer string

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，5 月前
查看次数：	5397 次
最近记录：	6 年，9 月前