使用dplyr获取方差为零的列名称

dhb*_*and 3 r lapply dplyr

我试图在我的数据中找到方差为零的任何变量(即恒定连续变量)。我想出了如何使用lapply做到这一点,但由于要遵循整洁的数据原理,因此我想使用dplyr。我可以使用dplyr创建一个仅包含方差的向量,但是它的最后一步是找到不等于零的值并返回使我感到困惑的变量名。

这是代码

library(PReMiuM)
library(tidyverse)
#> ?? Attaching packages ????????????????????????????????????????????????????????????????????????????????????? tidyverse 1.2.1 ??
#> ? ggplot2 2.2.1     ? purrr   0.2.4
#> ? tibble  1.4.2     ? dplyr   0.7.4
#> ? tidyr   0.7.2     ? stringr 1.2.0
#> ? readr   1.2.0     ? forcats 0.2.0
#> ?? Conflicts ???????????????????????????????????????????????????????????????????????????????????????? tidyverse_conflicts() ??
#> ? dplyr::filter() masks stats::filter()
#> ? dplyr::lag()    masks stats::lag()


setwd("~/Stapleton_Lab/Projects/Premium/hybridAnalysis/")

# read in data from analysis script
df <- read_csv("./hybrid.csv")
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   Exp = col_character(),
#>   Pedi = col_character(),
#>   Harvest = col_character()
#> )
#> See spec(...) for full column specifications.

# checking for missing variable
# df %>% 
#     select_if(function(x) any(is.na(x))) %>% 
    # summarise_all(funs(sum(is.na(.))))


# grab month for analysis
may <- df %>% 
    filter(Month==5)
june <- df %>% 
    filter(Month==6)
july <- df %>% 
    filter(Month==7)
aug <- df %>% 
    filter(Month==8)
sept <- df %>% 
    filter(Month==9)
oct <- df %>% 
    filter(Month==10)

# check for zero variance in continuous covariates
numericVars <- grep("Min|Max",names(june))

zero <- which(lapply(june[numericVars],var)==0,useNames = TRUE)

noVar <- june %>% 

    select(numericVars) %>% 

    summarise_all(var) %>% 

    filter_if(all, all_vars(. != 0))
#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical

#> Warning in .p(.tbl[[vars[[i]]]], ...): coercing argument of type 'double'
#> to logical
Run Code Online (Sandbox Code Playgroud)

Ben*_*min 5

通过一个可重复的示例,我认为您的目标如下。请注意,正如Colin所指出的,我没有处理您选择带有字符变量的变量的问题。有关详细信息,请参见他的答案。

# reproducible data
mtcars2 <- mtcars
mtcars2$mpg <- mtcars2$qsec <- 7

library(dplyr)

mtcars2 %>% 
  summarise_all(var) %>% 
  select_if(function(.) . == 0) %>% 
  names()
# [1] "mpg"  "qsec"
Run Code Online (Sandbox Code Playgroud)

就个人而言,我认为这混淆了您在做什么。以下是使用该purrr软件包的一种方法(如果您希望保留在tidyverse中),将是我的偏爱,并带有书面意见。

library(purrr)

# Return a character vector of variable names which have 0 variance
names(mtcars2)[which(map_dbl(mtcars2, var) == 0)]
names(mtcars2)[map_lgl(mtcars2, function(x) var(x) == 0)]
Run Code Online (Sandbox Code Playgroud)

如果您想对其速度进行优化,请坚持使用基准R

# Return a character vector of variable names which have 0 variance
names(mtcars2)[vapply(mtcars2, function(x) var(x) == 0, logical(1))]
Run Code Online (Sandbox Code Playgroud)