根据变量/ NA的存在来总结数据帧:在R中

KT_*_*T_1 2 r dataframe

对于示例数据框:

  df <- structure(
  list(
    country = structure(
      c(1L, 1L, 1L, 2L, 2L, 2L,
        3L, 3L, 3L, 3L),
      .Label = c("Austria", "France", "UK"),
      class = "factor"
    ),
    id = 1:10,
    region.0 = structure(
      c(1L, 1L, 1L, 2L, 2L, 2L,
        3L, 3L, 3L, 3L),
      .Label = c("AT", "FR", "UK"),
      class = "factor"
    ),
    region.1 = structure(
      c(1L, 1L, 2L, 3L, 3L, 3L, 4L, 4L, 6L,
        5L),
      .Label = c("AT1", "AT2", "FR1", "UK1", "UK4", "UK6"),
      class = "factor"
    ),
    region.2 = structure(
      c(1L, 1L, 2L, 3L, 4L, 5L, NA, NA, NA,
        NA),
      .Label = c("AT11", "AT21", "FR12", "FR14", "FR19"),
      class = "factor"
    ),
    region.3 = structure(
      c(NA, NA, NA, 1L, 2L, 3L, NA, NA, NA,
        NA),
      .Label = c("FR121", "FR142", "FR196"),
      class = "factor"
    )
  ),
  .Names = c("country",
             "id", "region.0", "region.1", "region.2", "region.3"),
  class = "data.frame",
  row.names = c(NA, -10L)
)
Run Code Online (Sandbox Code Playgroud)

我想制作一个汇总表,详细说明我的df数据框中可用的国家/地区级数据.

区域数据可在区域1,区域2或区域3中获得.数据可用,或列为"NA".无论该国家的"ID"如何,区域数据可用的级别都是相同的.

我想要在数据框中的最终结果如下:

  country region.1 region.2 region.3
1 Austria      Yes      Yes       No
2  France      Yes      Yes      Yes
3      UK      Yes       No       No
Run Code Online (Sandbox Code Playgroud)

任何人都可以建议一个特定的包或代码来帮助我吗?

akr*_*run 5

我们可以用data.table.将'data.frame'转换为'data.table'(setDT(df)),按'country'分组,指定要比较的列.SDcols,我们遍历列lapply,然后检查列中if all的值是'NA',然后我们得到"否"作为输出或else得到"是"

library(data.table)
setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) 
        "No" else "Yes") , country, .SDcols=region.1:region.3]
#    country region.1 region.2 region.3
#1: Austria      Yes      Yes       No
#2:  France      Yes      Yes      Yes
#3:      UK      Yes       No       No
Run Code Online (Sandbox Code Playgroud)

或者使用dplyr,我们可以在'country'(group_by)分组后实现相同的逻辑.

library(dplyr)
df %>%
    group_by(country) %>%
    summarise_each(funs(if(all(is.na(.))) "No" 
           else "Yes"), matches("^region\\.[1-9]"))
#country region.1 region.2 region.3
#   (fctr)    (chr)    (chr)    (chr)
#1 Austria      Yes      Yes       No
#2  France      Yes      Yes      Yes
#3      UK      Yes       No       No
Run Code Online (Sandbox Code Playgroud)