dplyr中的正则表达式匹配

Cla*_*lke 4 regex r stringr dplyr

在回答这个问题时,我写了以下代码:

df <- data.frame(Call_Num = c("HV5822.H4 C47 Circulating Collection, 3rd Floor", "QE511.4 .G53 1982 Circulating Collection, 3rd Floor", "TL515 .M63 Circulating Collection, 3rd Floor", "D753 .F4 Circulating Collection, 3rd Floor", "DB89.F7 D4 Circulating Collection, 3rd Floor"))

require(stringr)

matches = str_match(df$Call_Num, "([A-Z]+)(\\d+)\\s*\\.")
df2 <- data.frame(df, letter=matches[,2], number=matches[,3])
Run Code Online (Sandbox Code Playgroud)

现在我的问题是:有没有一种简单的方法将最后两行合并为一个dplyr调用,大概是使用mutate()?或者,我也对解决方案感兴趣do().对于该mutate()方法,由于我们正在提取2个组,因此我将采用一种解决方案,该解决方案str_match()使用不同的正则表达式调用两次,每个所需的组一个.

编辑:为了澄清,我在这里看到的主要挑战是str_match返回一个矩阵,我想知道如何处理mutate()do().我对使用其他提取信息的方法解决原始问题不感兴趣.这里已经提供了很多这样的解决方案.

Sam*_*rke 6

你可以extract()tidyr包中做到这一点:

extract(df, Call_Num, into = c("letter", "number"), regex = "([A-Z]+)(\\d+)\\s*\\.", remove = FALSE)

                                             Call_Num letter number
1     HV5822.H4 C47 Circulating Collection, 3rd Floor     HV   5822
2 QE511.4 .G53 1982 Circulating Collection, 3rd Floor     QE    511
3        TL515 .M63 Circulating Collection, 3rd Floor     TL    515
4          D753 .F4 Circulating Collection, 3rd Floor      D    753
5        DB89.F7 D4 Circulating Collection, 3rd Floor     DB     89
Run Code Online (Sandbox Code Playgroud)

它不是dplyr,但如上面链接的CRAN页面所述,tidyr"专门用于数据整理(不是一般的整形或聚合),并且与dplyr数据管道配合良好."


akr*_*run 3

你可以尝试使用do

df %>% 
  do(data.frame(., str_match(.$Call_Num,  "([A-Z]+)(\\d+)\\s*\\.")[,-1],
                              stringsAsFactors=FALSE)) %>%
  rename_(.dots=setNames(names(.)[-1],c('letter', 'number')))
#                                             Call_Num letter number
#1     HV5822.H4 C47 Circulating Collection, 3rd Floor     HV   5822
#2 QE511.4 .G53 1982 Circulating Collection, 3rd Floor     QE    511
#3        TL515 .M63 Circulating Collection, 3rd Floor     TL    515
#4          D753 .F4 Circulating Collection, 3rd Floor      D    753
#5        DB89.F7 D4 Circulating Collection, 3rd Floor     DB     89
Run Code Online (Sandbox Code Playgroud)

或者正如 @SamFirke 评论的那样,重命名列也可以通过

  ---                                    %>%
 setNames(., c(names(.)[1], "letter", "number"))
Run Code Online (Sandbox Code Playgroud)