问题描述:我目前正在从书籍系列中提取名称.许多角色将使用昵称,部分名称或标题.我有一个名单列表,我将其用作所有数据的模式.问题是我得到了全名和名字部分的多个匹配.我通过大量文本运行了总共3000个名称和名称变体.目前,名称从最长的字符串到最短的顺序被提取.
题:
如何确保在提取模式后,从字符串中删除它匹配的任何文本?
我得到了什么:
str_extract("Mr Bean and friends", pattern = fixed(c("Mr Bean", "Bean", "Mr")))
[1] "Mr Bean" "Bean" "Mr"
Run Code Online (Sandbox Code Playgroud)
我想要的是:(我知道我只能使用str_extract()或一行代码来实现这一点)
str_extract("Mr Bean and friends", pattern = fixed (c("Mr Bean", "Bean", "Mr")))
[1] "Mr Bean" NA NA
Run Code Online (Sandbox Code Playgroud) require(magrittr)
require(purrr)
is.out.same <- function(.call, ...) {
## Checks if args in .call will produce identical output in other functions
call <- substitute(.call) # Captures function call
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[1]] <- .y; return(.x)}
) %>%
map(eval) %>% # Evaluates function calls
map_lgl(identical, x = .call) %>% # Checks output of new calls against output of original call
all() # Returns TRUE …Run Code Online (Sandbox Code Playgroud)