对于示例数据框:
df <- structure(
list(
country = structure(
c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 3L),
.Label = c("Austria", "France", "UK"),
class = "factor"
),
id = 1:10,
region.0 = structure(
c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 3L),
.Label = c("AT", "FR", "UK"),
class = "factor"
),
region.1 = structure(
c(1L, 1L, 2L, 3L, 3L, 3L, 4L, 4L, 6L,
5L),
.Label = c("AT1", "AT2", "FR1", "UK1", "UK4", "UK6"),
class = "factor"
),
region.2 = structure(
c(1L, 1L, 2L, 3L, 4L, 5L, NA, NA, NA,
NA),
.Label = c("AT11", "AT21", "FR12", "FR14", "FR19"),
class = "factor"
),
region.3 = structure(
c(NA, NA, NA, 1L, 2L, 3L, NA, NA, NA,
NA),
.Label = c("FR121", "FR142", "FR196"),
class = "factor"
)
),
.Names = c("country",
"id", "region.0", "region.1", "region.2", "region.3"),
class = "data.frame",
row.names = c(NA, -10L)
)
Run Code Online (Sandbox Code Playgroud)
我想制作一个汇总表,详细说明我的df数据框中可用的国家/地区级数据.
区域数据可在区域1,区域2或区域3中获得.数据可用,或列为"NA".无论该国家的"ID"如何,区域数据可用的级别都是相同的.
我想要在数据框中的最终结果如下:
country region.1 region.2 region.3
1 Austria Yes Yes No
2 France Yes Yes Yes
3 UK Yes No No
Run Code Online (Sandbox Code Playgroud)
任何人都可以建议一个特定的包或代码来帮助我吗?
我们可以用data.table.将'data.frame'转换为'data.table'(setDT(df)),按'country'分组,指定要比较的列.SDcols,我们遍历列lapply,然后检查列中if all的值是'NA',然后我们得到"否"作为输出或else得到"是"
library(data.table)
setDT(df)[, lapply(.SD, function(x) if(all(is.na(x)))
"No" else "Yes") , country, .SDcols=region.1:region.3]
# country region.1 region.2 region.3
#1: Austria Yes Yes No
#2: France Yes Yes Yes
#3: UK Yes No No
Run Code Online (Sandbox Code Playgroud)
或者使用dplyr,我们可以在'country'(group_by)分组后实现相同的逻辑.
library(dplyr)
df %>%
group_by(country) %>%
summarise_each(funs(if(all(is.na(.))) "No"
else "Yes"), matches("^region\\.[1-9]"))
#country region.1 region.2 region.3
# (fctr) (chr) (chr) (chr)
#1 Austria Yes Yes No
#2 France Yes Yes Yes
#3 UK Yes No No
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
46 次 |
| 最近记录: |