如何在R中拆分带有tidyr :: separate的字符串并保留分隔符字符串的值?

TDo*_*Dog 6 r stringr tidyr

我有一个数据集:

crimes<-data.frame(x=c("Smith", "Jones"), charges=c("murder, first degree-G, manslaughter-NG", "assault-NG, larceny, second degree-G"))
Run Code Online (Sandbox Code Playgroud)

我正在使用tidyr:单独拆分与"G"匹配的费用列

crimes<-separate(crimes, charges, into=c("v1","v2"), sep="G,")
Run Code Online (Sandbox Code Playgroud)

这会拆分我的列,但会删除分隔符"G".我想在结果列拆分中保留"G".

我想要的输出是:

 x         v1                       v2
 Smith     murder, first degree-G   manslaughter-NG
 Jones     assault-NG               larceny, second degree-G
Run Code Online (Sandbox Code Playgroud)

欢迎任何建议.

Cam*_*ron 7

替换<yourRegexPattern>为您的正则表达式

如果你想要左栏中的“sep”(往后看)

dataframe %>% separate(column_to_sep, into = c("newCol1", "newCol2"), sep="(?<=<yourRegexPattern>)")
Run Code Online (Sandbox Code Playgroud)

如果你想要右栏中的“sep”(向前看)

dataframe %>% separate(column_to_sep, into = c("newCol1", "newCol2"), sep="(?=<yourRegexPattern>)")
Run Code Online (Sandbox Code Playgroud)

另请注意,当您尝试将一个单词与一组数字(Auguest1990August1990)分开时,您需要确保读取整个模式。

例子:

dataframe %>% separate(column_to_sep, into = c("newCol1", "newCol2"), sep="(?=[[:digit:]])", extra="merge")
Run Code Online (Sandbox Code Playgroud)

  • 不幸的是,后面的外观不适用于非固定长度的模式。 (3认同)

Mat*_*ina 5

UPDATE

这就是你要求的.请记住,您的数据不整齐(V1和V2在每列中都有多个变量)

A<-separate(crimes,charges,into=c("V1","V2"),sep = "(?<=G,)")
A
      x                      V1                        V2
1 Smith murder, first degree-G,           manslaughter-NG
2 Jones             assault-NG,  larceny, second degree-G
Run Code Online (Sandbox Code Playgroud)

保持"G"或"NG"的更简单方法是使用sep=", "alistaire所说的.

A<-separate(crimes, charges, into=c("v1","v2"), sep = ', ')
Run Code Online (Sandbox Code Playgroud)

这给了

      x         v1              v2
1 Smith   murder-G manslaughter-NG
2 Jones assault-NG       larceny-G
Run Code Online (Sandbox Code Playgroud)

如果你想继续分离data.frame(使用 - )

separate(A, v1, into = c("v3","v4"), sep = "-")
Run Code Online (Sandbox Code Playgroud)

这给了

      x      v3 v4              v2
1 Smith  murder  G manslaughter-NG
2 Jones assault NG       larceny-G
Run Code Online (Sandbox Code Playgroud)

您需要再次为v2列执行此操作.我不知道你是否想继续分离,请发布你的预期输出,以使我的答案更具体.