sca*_*der 3 regex r dplyr tidyverse
我有以下数据框:
df <- structure(list(X2 = c("BB_137.HVMSC", "BB_138.combined.HVMSC",
"BB_139.combined.HVMSC", "BB_140.combined.HVMSC", "BB_141.HVMSC",
"BB_142.combined.HMSC-bm")), .Names = "X2", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)
看起来像这样
> df
# A tibble: 6 x 1
X2
<chr>
1 BB_137.HVMSC
2 BB_138.combined.HVMSC
3 BB_139.combined.HVMSC
4 BB_140.combined.HVMSC
5 BB_141.HVMSC
6 BB_142.combined.HMSC-bm
Run Code Online (Sandbox Code Playgroud)
我想要做的是.将最后一个字段保留为第二列,分成两列(作为分隔符)
col1 col2
BB_137 HVMSC
BB_138.combined HVMSC
BB_139.combined HVMSC
BB_140.combined HVMSC
BB_141 HVMSC
BB_142.combined HMSC-bm
Run Code Online (Sandbox Code Playgroud)
什么是正确的方法呢?
我的尝试是这样的:
> df %>% separate(X2, into = c("sid","status", "tiss"), sep = "[.]")
# A tibble: 6 x 3
sid status tiss
* <chr> <chr> <chr>
1 BB_137 HVMSC <NA>
2 BB_138 combined HVMSC
3 BB_139 combined HVMSC
4 BB_140 combined HVMSC
5 BB_141 HVMSC <NA>
6 BB_142 combined HMSC-bm
Run Code Online (Sandbox Code Playgroud)
警告消息:2个位置的值太少:1,5
Ron*_*hah 10
我们可以在单独的函数中使用负向前瞻作为分隔符.
library(tidyr)
separate(data = df, col = X2, into = c("col1", "col2"), sep = "(\\.)(?!.*\\.)")
# col1 col2
# <chr> <chr>
#1 BB_137 HVMSC
#2 BB_138.combined HVMSC
#3 BB_139.combined HVMSC
#4 BB_140.combined HVMSC
#5 BB_141 HVMSC
#6 BB_142.combined HMSC-bm
Run Code Online (Sandbox Code Playgroud)
正则表达式取自这个答案.
| 归档时间: |
|
| 查看次数: |
789 次 |
| 最近记录: |