如何从两个不同维度的数据框中提取特定行并生成多个 .csv 文件?

cla*_*INK 5 sqlite r

数据框一。

  structure(list(trial_id = c(2022L, 2023L, 2123L, 2184L, 3883L, 
4434L), ctri_number = c("CTRI/2018/02/011794 ", "CTRI/2017/08/009517 ", 
"CTRI/2019/05/019036 ", "CTRI/2017/12/010935 ", "CTRI/2017/09/009746 ", 
"CTRI/2016/06/007055 "), name = c("National Institute of Allergy and Infectious Diseases NIAIDMaryland USA", 
"Jawaharlal Nehru Medical College", "KLEU Ayurveda Pharmacy", 
"Amgen Inc", "Dr Arunkumar", "ALVAS EDUCATION FOUNDATION"), type_of_sponsor = c("' Government funding agency '", 
"' Government medical college '", "' Research institution '", 
"' Pharmaceutical industry-Global '", " Other [Self sponsored] '", 
"' Private hospital/clinic '"), address = c("' USA '", "' Jawaharlal Nehru Medical College, Aligarh Muslim University, Aligarh-202001 '", 
"' KLEU Ayurveda Pharmacy, Khasbhag, Belgaum, Karnataka '", "' One Amgen Center Drive\n\n\nThousand Oaks, CA USA\n\n\n91320 '", 
"' Room no 32 ,Department of Periodontics , Government Dental college , Trivandrum '", 
"' ALVAS EDUCATION FOUNDATION ALVAS COLLEGE OF PHYSIOTHERAPY\n\n\nMoodabidri - 574227\n\n\nSouth Canara District\n\n\nKarnataka '"
)), row.names = c(NA, 6L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

数据框二。

    structure(list(distinctOrganizations = c("A AMMU", "A and U tibbia college and hospital", 
"A Arumuga kani", "A KIREETI", "AAMIR ZUBAIR SHAIKH", "Aansu Susan Varghese"
)), row.names = c(NA, 6L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

使用数据框 2(distinctOrganizations) 中的所有数据字段,我必须从数据框 1 中提取与名称列中的值匹配的行。

但是,每个数据字段都应生成特定的 .csv 文件。

我怎样才能实现这个目标?


可能的结果 - 类似于图像的 CSV 文件。

该图像是 CSV 文件,其中仅包含与 AIIMS 及其变体相关的所有行。 我需要针对每个此类名称不同的 CSV 文件。

Mar*_*Gal 2

首先:您的示例数据与任何行都不匹配(df2不提供示例中包含的任何名称df1)。

如果我答对了你的问题,你可以使用

library(dplyr)
library(purrr)
library(readr)

df1 %>% 
  inner_join(df2, by = c("name" = "distinctOrganizations")) %>% 
  split(f = .$name) %>% 
  walk(~write_csv(.x, paste0(unique(.x$name), ".csv")))
Run Code Online (Sandbox Code Playgroud)
  1. 我们使用 aninner_join来删除其中df1不匹配的所有元素df2
  2. 然后我们split按名称生成结果 data.frame,为每个(不同的)组织创建一个新的 data.frame
  3. 最后,我们使用purrr的函数为每个组织walk编写一个文件。.csv这会生成类似或 的.csv文件。Amgen Inc.csvALVAS EDUCATION FOUNDATION.csv

注意:address列包含一些换行符 ( \n)。您应该考虑删除它们,这些可能会给您.csv以及后续处理这些问题的步骤带来麻烦。type_of_sponsor您可能想要删除列中(开头和结尾)中的一些空格。

在此输入图像描述

数据

我修改df2以获得两个匹配:

df2 <- structure(list(distinctOrganizations = c("Amgen Inc", "A and U tibbia college and hospital", 
"ALVAS EDUCATION FOUNDATION", "A KIREETI", "AAMIR ZUBAIR SHAIKH", 
"Aansu Susan Varghese")), row.names = c(NA, 6L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)