我在表单中提取ID时遇到一些困难:
27da12ce-85fe-3f28-92f9-e5235a5cf6ac
Run Code Online (Sandbox Code Playgroud)
来自数据框:
a<-c("NAME_27da12ce-85fe-3f28-92f9-e5235a5cf6ac_THOMAS_MYR",
"NAME_94773a8c-b71d-3be6-b57e-db9d8740bb98_THIMO",
"NAME_1ed571b4-1aef-3fe2-8f85-b757da2436ee_ALEX",
"NAME_9fbeda37-0e4f-37aa-86ef-11f907812397_JOHN_TYA",
"NAME_83ef784f-3128-35a1-8ff9-daab1c5f944b_BISHOP",
"NAME_39de28ca-5eca-3e6c-b5ea-5b82784cc6f4_DUE_TO",
"NAME_0a52a024-9305-3bf1-a0a6-84b009cc5af4_WIS_MICHAL",
"NAME_2520ebbb-7900-32c9-9f2d-178cf04f7efc_Sarah_Lu_Van_Gar/Thomas")
Run Code Online (Sandbox Code Playgroud)
基本上它是第一个和第二个下划线之间的东西.
通常我接近:
library(tidyr)
df$a<-as.character(df$a)
df<-df[grep("_", df$a), ]
df<- separate(df, a, c("ID","Name") , sep = "_")
df$a<-as.numeric(df$ID)
Run Code Online (Sandbox Code Playgroud)
然而这次有许多下划线......我的方法失败了.有没有办法提取该ID?
我认为你应该用extract而不是separate.您需要指定要捕获的模式.我在这里假设ID总是从一个数字开始,所以我在第一个数字之后捕获所有内容直到下一个数字_然后在它之后的所有内容
df <- data.frame(a)
df <- df[grep("_", df$a),, drop = FALSE]
extract(df, a, c("ID", "NAME"), "[A-Za-z].*?(\\d.*?)_(.*)")
# ID NAME
# 1 27da12ce-85fe-3f28-92f9-e5235a5cf6ac THOMAS_MYR
# 2 94773a8c-b71d-3be6-b57e-db9d8740bb98 THIMO
# 3 1ed571b4-1aef-3fe2-8f85-b757da2436ee ALEX
# 4 9fbeda37-0e4f-37aa-86ef-11f907812397 JOHN_TYA
# 5 83ef784f-3128-35a1-8ff9-daab1c5f944b BISHOP
# 6 39de28ca-5eca-3e6c-b5ea-5b82784cc6f4 DUE_TO
# 7 0a52a024-9305-3bf1-a0a6-84b009cc5af4 WIS_MICHAL
# 8 2520ebbb-7900-32c9-9f2d-178cf04f7efc Sarah_Lu_Van_Gar/Thomas
Run Code Online (Sandbox Code Playgroud)