Eli*_*rra 5 if-statement r conditional-statements lapply
I have a list of dataframes for which I want to obtain (in a separate dataframe) the row mean of a specified column which may or may not exist in all dataframes of the list. My problem comes when the specified column does not exist in at least one of the dataframes of the list.
Assume the following example list of dataframes:
df1 <- read.table(text = 'X A B C
name1 1 2 3
name2 5 10 4',
header = TRUE)
df2 <- read.table(text = 'X B C A
name1 8 1 31
name2 9 9 8',
header = TRUE)
df3 <- read.table(text = 'X B A E
name1 9 9 29
name2 5 15 55',
header = TRUE)
mylist_old <-list(df1, df2)
mylist_new <-list(df1, df2, df3)
Run Code Online (Sandbox Code Playgroud)
Assume I want to rowMeans column C the following piece of code works perfectly when the list of dataframe (mylist_old) is composed of elements df1 and df2, :
Mean_C <- rowMeans(do.call(cbind, lapply(mylist_old, "[", "C")))
Mean_C <- as.data.frame(Mean_C)
Run Code Online (Sandbox Code Playgroud)
The trouble comes when the list is composed of at least one dataframe for which column C does not exist, which in my example is the case of df3, that is for list mylist_new:
Mean_C <- rowMeans(do.call(cbind, lapply(mylist_new, "[", "C")))
Run Code Online (Sandbox Code Playgroud)
Leads to: "Error in [.data.frame(X[[i]], ...) : undefined columns selected
One way to circumvent this issue is to exclude df3 from mylist_new. However, my real program has a list of 64 dataframes for which I do not know whether column C exists or not. I would have like to lapply my piece of code only if column C is detected as existing, that is applying the command to the list of dataframes but only for dataframes for which existence of column C is true.
I tried this
if("C" %in% colnames(mylist_new))
{
Mean_C <- rowMeans(do.call(cbind, lapply(mylist_new, "[", "C")))
Mean_C <- as.data.frame(Mean_C)
}
Run Code Online (Sandbox Code Playgroud)
But nothing happens, probably because colnames refers to the list and not to each dataframe of the list. With 64 dataframes, I cannot refer to each "manually" and need an automated procedure.
这里是一个选项Filter的list元素,然后应用lapply的过滤list
rowMeans(do.call(cbind, lapply(Filter(function(x) "C" %in% names(x),
mylist_new), `[[`, "C")))
#[1] 2.0 6.5
Run Code Online (Sandbox Code Playgroud)
或tidyverse不带Filtering 使用,但使用select来忽略不存在该列的情况
library(tidyverse)
map(mylist_new, ~ .x %>%
select(one_of("C"))) %>% # gives a warning
bind_cols %>%
rowMeans
#[1] 2.0 6.5
Run Code Online (Sandbox Code Playgroud)
最好有些警告说该列不存在
还是没有警告
map(mylist_new, ~ .x %>%
select(matches("^C$"))) %>%
bind_cols %>%
rowMeans
#[1] 2.0 6.5
Run Code Online (Sandbox Code Playgroud)