数据框一。
  structure(list(trial_id = c(2022L, 2023L, 2123L, 2184L, 3883L, 
4434L), ctri_number = c("CTRI/2018/02/011794 ", "CTRI/2017/08/009517 ", 
"CTRI/2019/05/019036 ", "CTRI/2017/12/010935 ", "CTRI/2017/09/009746 ", 
"CTRI/2016/06/007055 "), name = c("National Institute of Allergy and Infectious Diseases NIAIDMaryland USA", 
"Jawaharlal Nehru Medical College", "KLEU Ayurveda Pharmacy", 
"Amgen Inc", "Dr Arunkumar", "ALVAS EDUCATION FOUNDATION"), type_of_sponsor = c("' Government funding agency '", 
"' Government medical college '", "' Research institution '", 
"' Pharmaceutical industry-Global '", " Other [Self sponsored] '", 
"' Private hospital/clinic '"), address = c("' USA '", "' Jawaharlal Nehru Medical College, Aligarh Muslim University, Aligarh-202001 '", 
"' KLEU Ayurveda Pharmacy, Khasbhag, Belgaum, Karnataka '", "' One Amgen Center Drive\n\n\nThousand Oaks, CA USA\n\n\n91320 '", 
"' Room no 32 ,Department of Periodontics , Government Dental college , Trivandrum '", 
"' ALVAS EDUCATION FOUNDATION ALVAS COLLEGE OF PHYSIOTHERAPY\n\n\nMoodabidri - 574227\n\n\nSouth Canara District\n\n\nKarnataka '"
)), row.names = c(NA, 6L), class = "data.frame")
数据框二。
    structure(list(distinctOrganizations = c("A AMMU", "A and U tibbia college and hospital", 
"A Arumuga kani", "A KIREETI", "AAMIR ZUBAIR SHAIKH", "Aansu Susan Varghese"
)), row.names = c(NA, 6L), class = "data.frame")
使用数据框 2(distinctOrganizations) 中的所有数据字段,我必须从数据框 1 中提取与名称列中的值匹配的行。
但是,每个数据字段都应生成特定的 .csv 文件。
我怎样才能实现这个目标?
可能的结果 - 类似于图像的 CSV 文件。
首先:您的示例数据与任何行都不匹配(df2不提供示例中包含的任何名称df1)。
如果我答对了你的问题,你可以使用
library(dplyr)
library(purrr)
library(readr)
df1 %>% 
  inner_join(df2, by = c("name" = "distinctOrganizations")) %>% 
  split(f = .$name) %>% 
  walk(~write_csv(.x, paste0(unique(.x$name), ".csv")))
inner_join来删除其中df1不匹配的所有元素df2split按名称生成结果 data.frame,为每个(不同的)组织创建一个新的 data.framepurrr的函数为每个组织walk编写一个文件。.csv这会生成类似或 的.csv文件。Amgen Inc.csvALVAS EDUCATION FOUNDATION.csv注意:该address列包含一些换行符 ( \n)。您应该考虑删除它们,这些可能会给您.csv以及后续处理这些问题的步骤带来麻烦。type_of_sponsor您可能想要删除列中(开头和结尾)中的一些空格。
我修改df2以获得两个匹配:
df2 <- structure(list(distinctOrganizations = c("Amgen Inc", "A and U tibbia college and hospital", 
"ALVAS EDUCATION FOUNDATION", "A KIREETI", "AAMIR ZUBAIR SHAIKH", 
"Aansu Susan Varghese")), row.names = c(NA, 6L), class = "data.frame")