R:调整案例多次出现的值

Fab*_*ian 5 r

我有一个问题,希望不会成为高级R用户的巨大障碍......

test.data <- data.frame(case = c(1, 1, 1, 2, 2, 2, 3),
                        year = c(2006, 2007, 2008, 2007, 2006, 2008, 2006),
                        level = c(10, 20, 20, 12, 20, 20, 20))
Run Code Online (Sandbox Code Playgroud)

正如您可能看到的那样,每个案例都有多次出现,以年份为特征.级别的值在一个案例中不同,我想通过将级别的每个值设置为给定案例的最小级别来纠正它.在这个例子中,每个值水平为= 1时应该是10,并且每个值级别 = 2应该是12.对于我可以做以下的任何特定情况下的情况:

test.data$level[test.data$case==1] <- min(test.data$level[test.data$case==1])
Run Code Online (Sandbox Code Playgroud)

但由于我有几百个案例,这需要很长时间.因此,我想问一下你是否有更快的解决方案.

akr*_*run 5

你可以试试

 library(data.table)
 setDT(test.data)[, level:= min(level, na.rm=TRUE), case]
 #    case year level
 #1:    1 2006    10
 #2:    1 2007    10
 #3:    1 2008    10
 #4:    2 2007    12
 #5:    2 2006    12
 #6:    2 2008    12
 #7:    3 2006    20
Run Code Online (Sandbox Code Playgroud)

或使用 dplyr

 library(dplyr)
 test.data %>% 
        group_by(case) %>% 
        mutate(level= min(level, na.rm=TRUE))
 #   case year level
 #1    1 2006    10
 #2    1 2007    10
 #3    1 2008    10
 #4    2 2007    12
 #5    2 2006    12
 #6    2 2008    12
 #7    3 2006    20
Run Code Online (Sandbox Code Playgroud)

或使用 sqldf/dplyr

  library(sqldf)
  library(dplyr)
  sqldf('select * from "test.data"
            left join(select "case", 
              min(level) as Level
              from "test.data" 
              group by "case")
            using ("case")') %>%
                         select(-level)
  #   case year Level
  #1    1 2006    10
  #2    1 2007    10
  #3    1 2008    10
  #4    2 2007    12
  #5    2 2006    12
  #6    2 2008    12
  #7    3 2006    20
Run Code Online (Sandbox Code Playgroud)

或者仅由@ G.Grothendieck建议的修改 sqldf

  sqldf('select "case", year, "min(level)" as Level 
            from "test.data" 
               left join(select "case", min(level)
                         from "test.data" 
                         group by "case") 
                     using ("case")')

  #1    1 2006    10
  #2    1 2007    10
  #3    1 2008    10
  #4    2 2007    12
  #5    2 2006    12
  #6    2 2008    12
  #7    3 2006    20
Run Code Online (Sandbox Code Playgroud)

或使用 base R

 test.data$level <- with(test.data, ave(level, case, FUN=min))
Run Code Online (Sandbox Code Playgroud)


Rom*_*rik 5

这是使用基本R函数的经典之作.

# may not be optimal for larger datasets due to merge
min.lvl <- aggregate(level ~ case, data = test.data, FUN = min)
merge(x = test.data, y = min.lvl, by = "case", all.x = TRUE, sort = FALSE)

  case year level.x level.y
1    1 2006      10      10
2    1 2007      20      10
3    1 2008      20      10
4    2 2007      12      12
5    2 2006      20      12
6    2 2008      20      12
7    3 2006      20      20
Run Code Online (Sandbox Code Playgroud)

第二个香草选择的做法是

new.data <- by(data = test.data, INDICES = test.data$case, FUN = function(x) {
  x$level <- min(x$level)
  x
})

do.call("rbind", new.data)

    case year level
1.1    1 2006    10
1.2    1 2007    10
1.3    1 2008    10
2.4    2 2007    12
2.5    2 2006    12
2.6    2 2008    12
3      3 2006    20
Run Code Online (Sandbox Code Playgroud)