Given the following dataset, I want to compute for each row the median of the columns M1,M2 and M3. I am looking for a solution where the final column is added to the dataframe under the name 'Median'. The column names (M1:M3) should not be used directly (in the original dataset, there are many more columns, not just 3).
# A tibble: 8 x 5
I1 M1 M2 I2 M3
<int> <int> <int> <int> <int>
1 3 4 5 3 5
2 2 2 2 2 1
3 2 2 2 2 2
4 3 1 3 3 1
5 2 1 3 3 1
6 3 2 4 4 3
7 3 1 3 4 1
8 2 1 3 2 3
Run Code Online (Sandbox Code Playgroud)
You can load the dataset using:
df = structure(list(I1 = c(3L, 2L, 2L, 3L, 2L, 3L, 3L, 2L), M1 = c(4L,
2L, 2L, 1L, 1L, 2L, 1L, 1L), M2 = c(5L, 2L, 2L, 3L, 3L, 4L, 3L,
3L), I2 = c(3L, 2L, 2L, 3L, 3L, 4L, 4L, 2L), M3 = c(5L, 1L, 2L,
1L, 1L, 3L, 1L, 3L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L), .Names = c("I1", "M1", "M2", "I2",
"M3"))
Run Code Online (Sandbox Code Playgroud)
I know that several similar questions have already been asked. However, most solutions posted use rowMeans or rowSums. I'm looking for a solution where:
The reason for (2) is that I am teaching the 'tidyverse' to total beginners.
We could use rowMedians
library(matrixStats)
library(dplyr)
df %>%
mutate(Median = rowMedians(as.matrix(.[grep('M\\d+', names(.))])))
Run Code Online (Sandbox Code Playgroud)
Or if we need to use only tidyverse functions, convert it to 'long' format with gather, summarize by row and get the median of the 'value' column
df %>%
rownames_to_column('rn') %>%
gather(key, value, starts_with('M')) %>%
group_by(rn) %>%
summarise(Median = median(value)) %>%
ungroup %>%
select(-rn) %>%
bind_cols(df, .)
Run Code Online (Sandbox Code Playgroud)
Or another option is rowwise() from dplyr (hope the row is not a problem)
df %>%
rowwise() %>%
mutate(Median = median(c(!!! rlang::syms(grep('M', names(.), value=TRUE)))))
Run Code Online (Sandbox Code Playgroud)