如何根据外部列表过滤表的行？

Question

如何根据外部列表过滤表的行？

(1)我在R中读取了一个超过10000行和10列的大表.

(2)表格的第3栏包含医院的名称.其中一些是重复的甚至更多.

(3)我有一份医院名单,例如其中10个需要进一步研究.

(4)你能不能教我如何使用步骤3中列出的名称提取step1中的所有行？

这是我的输入文件的一个较短的例子;

Patients Treatment Hospital Response 
1        A         YYY      Good 
2        B         YYY      Dead 
3        A         ZZZ      Good 
4        A         WWW      Good 
5        C         UUU      Dead

Run Code Online (Sandbox Code Playgroud)

我有一份我有兴趣进一步研究的医院名单,即YYY和UUU.如何使用R生成如下的输出表？

Patients Treatment Hospital Response 
1        A         YYY      Good 
2        B         YYY      Dead 
5        C         UUU      Dead

Run Code Online (Sandbox Code Playgroud)

Answer 1

Cha*_*ase 25

使用%in%运营商.

#Sample data
dat <- data.frame(patients = 1:5, treatment = letters[1:5],
  hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5))

#List of hospitals we want to do further analysis on
goodHosp <- c("yyy", "uuu")

Run Code Online (Sandbox Code Playgroud)

您可以直接索引到data.frame对象:

dat[dat$hospital %in% goodHosp ,]

Run Code Online (Sandbox Code Playgroud)

或使用subset命令:

subset(dat, hospital %in% goodHosp)

Run Code Online (Sandbox Code Playgroud)

Answer 2

RK1*_*RK1 9

使用 dplyr

设置数据 --- 使用@Chase的示例数据。

#Sample data
df <- data.frame(patients = 1:5, treatment = letters[1:5],
  hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5))

#List of hospitals we want to do further analysis on
goodHosp <- c("yyy", "uuu")

Run Code Online (Sandbox Code Playgroud)

现在使用过滤数据 dplyr filter

library(dplyr)
df %>% filter(hospital %in% goodHosp)

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，10 月前
查看次数：	27210 次
最近记录：	7 年，2 月前