我有一个包含以下信息的数据集:
Subject Value1 Value2 Value3 UniqueNumber
001 1 0 1 3
002 0 1 1 2
003 1 1 1 1
Run Code Online (Sandbox Code Playgroud)
如果UniqueNumber的值> 0,我想将dplyr的值与第1行到UniqueNumber中的每个主题相加并计算均值.因此对于Subject 001,sum = 2并且mean = .67.
total = 0;
average = 0;
for(i in 1:length(Data$Subject)){
for(j in 1:ncols(Data)){
if(Data$UniqueNumber[i] > 0){
total[i] = sum(Data[i,1:j])
average[i] = mean(Data[i,1:j])
}
}
Run Code Online (Sandbox Code Playgroud)
编辑:我只想查看"UniqueNumber"列中列出的列数.所以这循环遍历每一行并停在'UniqueNumber'中列出的列.示例:带有Subject 002的第2行应该将"Value1"和"Value2"列中的值相加,而带有Subject 003的第3行应该只对"Value1"列中的值求和.
不是一个整齐的粉丝/专家,但我会尝试使用长格式.然后,只按每个组的行索引进行过滤,然后在单个列上运行您想要的任何函数(这样更容易).
library(tidyr)
library(dplyr)
Data %>%
gather(variable, value, -Subject, -UniqueNumber) %>% # long format
group_by(Subject) %>% # group by Subject in order to get row counts
filter(row_number() <= UniqueNumber) %>% # filter by row index
summarise(Mean = mean(value), Total = sum(value)) %>% # do the calculations
ungroup()
## A tibble: 3 x 3
# Subject Mean Total
# <int> <dbl> <int>
# 1 1 0.667 2
# 2 2 0.5 1
# 3 3 1 1
Run Code Online (Sandbox Code Playgroud)
实现此目的的一种非常类似的方法可能是通过列名中的整数进行过滤.过滤器步骤在group_by
它之前,所以它可能会提高性能(或不是?)但它不那么健壮,因为我假设感兴趣的cols被调用"Value#"
Data %>%
gather(variable, value, -Subject, -UniqueNumber) %>% #long format
filter(as.numeric(gsub("Value", "", variable, fixed = TRUE)) <= UniqueNumber) %>% #filter
group_by(Subject) %>% # group by Subject
summarise(Mean = mean(value), Total = sum(value)) %>% # do the calculations
ungroup()
## A tibble: 3 x 3
# Subject Mean Total
# <int> <dbl> <int>
# 1 1 0.667 2
# 2 2 0.5 1
# 3 3 1 1
Run Code Online (Sandbox Code Playgroud)
只是为了好玩,添加一个data.table解决方案
library(data.table)
data.table(Data) %>%
melt(id = c("Subject", "UniqueNumber")) %>%
.[as.numeric(gsub("Value", "", variable, fixed = TRUE)) <= UniqueNumber,
.(Mean = round(mean(value), 3), Total = sum(value)),
by = Subject]
# Subject Mean Total
# 1: 1 0.667 2
# 2: 2 0.500 1
# 3: 3 1.000 1
Run Code Online (Sandbox Code Playgroud)
检查这个解决方案:
df %>%
gather(key, val, Value1:Value3) %>%
group_by(Subject) %>%
mutate(
Sum = sum(val[c(1:(UniqueNumber[1]))]),
Mean = mean(val[c(1:(UniqueNumber[1]))]),
) %>%
spread(key, val)
Run Code Online (Sandbox Code Playgroud)
输出:
Subject UniqueNumber Sum Mean Value1 Value2 Value3
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 001 3 2 0.667 1 0 1
2 002 2 1 0.5 0 1 1
3 003 1 1 1 1 1 1
Run Code Online (Sandbox Code Playgroud)