从列表子集以逗号分隔的字符串

Jas*_*ler 1 r dplyr

这似乎是一个简单的操作,但是我似乎被卡住了,正在寻找指针。

我有一个作者及其相关出版物的数据框。在此author列中,以分号分隔的列表中的一篇文章通常有多位作者。这是一小部分:

structure(list(author = c("Moscatelli, Adriana; Nishina, Adrienne", 
"Asangba, Abigail", "Stewart, Abigail", "Redmond-Sanogo, Adrienne; Lee, Ahlam", 
"Purnamasari, Agustina; Lee, Ahlam; Moscatelli, Adriana", 
"Nishina, Adrienne", "Lee, Ahlam", 
"Lee, Ahlam; Cloutier, Aimee", "Kleihauer, Jay; Stephens, Roy; Hart, William", 
"Foor, Ryan M.; Cano, Jamie"), pubtitle = c("AIP Conference Proceedings", 
"Journal of Case Studies in Accreditation and Assessment", "173rd Meeting of Acoustical Society of America", 
"Journal of Research in Gender Studies", "Journal of Research in Gender Studies", 
"Scientometrics", "Journal of Agricultural Education", "Journal of Agricultural Education", 
"Journal of Agricultural Education", "Journal of Agricultural Education"
)), class = c("rowwise_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-10L))
Run Code Online (Sandbox Code Playgroud)

我还有第二个数据框,其中只有作者姓名。为了重现性,以下是这些名称的子集:

structure(list(author = c("Asangba, Abigail", "Stewart, Abigail", 
"Moscatelli, Adriana", "Nishina, Adrienne", "Redmond-Sanogo, Adrienne", 
"Purnamasari, Agustina", "Lee, Ahlam", "Aliyeva, Aida", "Belanger, Aimee", 
"Cloutier, Aimee")), row.names = c(NA, 10L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

我正在尝试使用第二个数据帧从原始数据帧中提取数据子集,并且使用分号分隔的名称也遇到了挑战。

我以为这可以带我到那里,但是到目前为止还没有运气。我试图将带分隔符的字符串更改为向量,然后与作者列表进行匹配,但它只返回单独出现的名称(或者,字符串中出现的名称没有匹配项)。

list_authors_female <- data %>% 
  select(author, pubtitle) %>% 
  filter(author %in% female_authors_all)
Run Code Online (Sandbox Code Playgroud)

在这里,我试图将author列分成一个向量,但遇到错误。

list_authors_female <- data %>%  
  rowwise() %>% 
  mutate(author_list = str_split(author, pattern = ";")) %>% 
  filter(author_list %in% female_authors_all)
Run Code Online (Sandbox Code Playgroud)

有指针吗?谢谢!

G. *_*eck 5

创建pat表单的正则表达式author1|author2|...|authorN并将其应用于pubs。使用这种方法,不需要拆分。

pat <- authors %>% 
  rowwise %>% 
  mutate(author = toString(author)) %>%
  ungroup %>%
  { paste(.$author, collapse = "|") }

pubs %>% filter(grepl(pat, author))
Run Code Online (Sandbox Code Playgroud)

给予:

# A tibble: 8 x 2
  author                                 pubtitle                               
  <chr>                                  <chr>                                  
1 Moscatelli, Adriana; Nishina, Adrienne AIP Conference Proceedings             
2 Asangba, Abigail                       Journal of Case Studies in Accreditati~
3 Stewart, Abigail                       173rd Meeting of Acoustical Society of~
4 Redmond-Sanogo, Adrienne; Lee, Ahlam   Journal of Research in Gender Studies  
5 Purnamasari, Agustina; Lee, Ahlam; Mo~ Journal of Research in Gender Studies  
6 Nishina, Adrienne                      Scientometrics                         
7 Lee, Ahlam                             Journal of Agricultural Education      
8 Lee, Ahlam; Cloutier, Aimee            Journal of Agricultural Education  
Run Code Online (Sandbox Code Playgroud)