具有来自其他列的先前非NA值的difftime

ats*_*kov 5 r difftime

我有一个3个变量的数据框:POSIXct对象 - time,数字 - RRR和因子 - he.RRR液体降水量和he水文事件数在哪里,这里的时间对应于洪水事件的开始.

df <- structure(list(time = structure(c(1396879200, 1396922400, 1396976400, 
                                        1397008800, 1397095200, 1397332800, 1397354400, 1397397600, 1397451600, 
                                        1397484000, 1397527200, 1397786400, 1397959200, 1398002400, 1398024000, 
                                        1398132000, 1398175200, 1398218400, 1398261600, 1398369600, 1398466800, 
                                        1398477600, 1398520800, 1398564000, 1398607200, 1398747600, 1398780000, 
                                        1398909600, 1398952800, 1398974400, 1398996000),
                                      class = c("POSIXct", "POSIXt"),
                                      tzone = ""),
                     RRR = c(NA, 2, NA, 4, NA, NA, 0.9, 3, 
                             NA, 0.4, 11, NA, 0.5, 1, NA, 13, 4, 0.8, 0.3, NA, NA, 8, 4, 11, 
                             1, NA, 7, 1, 0.4, NA, 4),
                     he = c(1, NA, 2, NA, 3, 4, NA, NA, 
                            5, NA, NA, 6, NA, NA, 7, NA, NA, NA, NA, 8, 9, NA, NA, NA, NA, 
                            10, NA, NA, NA, 11, NA)), 
                class = "data.frame", 
                row.names = c(NA, -31L))
Run Code Online (Sandbox Code Playgroud)

我的数据框主管看起来如下:

> df
                  time  RRR he
1  2014-04-07 18:00:00   NA  1
2  2014-04-08 06:00:00  2.0 NA
3  2014-04-08 21:00:00   NA  2
4  2014-04-09 06:00:00  4.0 NA
5  2014-04-10 06:00:00   NA  3
6  2014-04-13 00:00:00   NA  4
7  2014-04-13 06:00:00  0.9 NA
8  2014-04-13 18:00:00  3.0 NA
9  2014-04-14 09:00:00   NA  5
Run Code Online (Sandbox Code Playgroud)

我需要计算每个值的时间最后一个非NA 之间的时间差.例如,对于期望的差异将是,而对于时间差应该是.所以最后我想得到一个像这样的数据帧,其中'diff'是以小时为单位的时差.heRRRhe = 2difftime(df$time[3], df$time[2])he = 4difftime(df$time[6], df$time[4])

> df
                  time  RRR he  diff
1  2014-04-07 18:00:00   NA  1  NA
2  2014-04-08 06:00:00  2.0 NA  NA
3  2014-04-08 21:00:00   NA  2  15
4  2014-04-09 06:00:00  4.0 NA  NA
5  2014-04-10 06:00:00   NA  3  24
6  2014-04-13 00:00:00   NA  4  90
7  2014-04-13 06:00:00  0.9 NA  NA
8  2014-04-13 18:00:00  3.0 NA  NA
9  2014-04-14 09:00:00   NA  5  15
Run Code Online (Sandbox Code Playgroud)

tmf*_*mnk 1

我确信一定有更简单的方法,但是使用tidyverseanddata.table你可以这样做:

df %>%
 mutate(time = as.POSIXct(time, format = "%Y-%m-%d %H:%M:%S")) %>% #Transforming "time" into a datetime object
 fill(RRR) %>% #Filling the NA values in "RRR" with tha last non-NA value
 group_by(temp = rleid(RRR)) %>% #Grouping by run length of "RRR"
 mutate(temp2 = seq_along(temp)) %>% #Sequencing around the run length of "RRR"
 group_by(RRR, temp) %>% #Group by "RRR" and run length of "RRR"
 mutate(diff = ifelse(!is.na(he), difftime(time, time[temp2 == 1], units="hours"), NA)) %>% #Computing the difference in hours between the first occurrence of a non-NA "RRR" value and the non-NA "he" values
 ungroup() %>%
 select(-temp, -temp2, -RRR) %>% #Removing the redundant variables
 rowid_to_column() %>% #Creating unique row IDs
 left_join(df %>% 
            rowid_to_column() %>%
            select(RRR, rowid), by = c("rowid" = "rowid")) %>% #Merging with the original df to get the original values of "RRR"
 select(-rowid) #Removing the redundant variables

   time                   he  diff    RRR
   <dttm>              <dbl> <dbl>  <dbl>
 1 2014-04-07 16:00:00    1.    0. NA    
 2 2014-04-08 04:00:00   NA    NA   2.00 
 3 2014-04-08 19:00:00    2.   15. NA    
 4 2014-04-09 04:00:00   NA    NA   4.00 
 5 2014-04-10 04:00:00    3.   24. NA    
 6 2014-04-12 22:00:00    4.   90. NA    
 7 2014-04-13 04:00:00   NA    NA   0.900
 8 2014-04-13 16:00:00   NA    NA   3.00 
 9 2014-04-14 07:00:00    5.   15. NA    
10 2014-04-14 16:00:00   NA    NA   0.400
Run Code Online (Sandbox Code Playgroud)