原始数据看起来像这样,我想按访问者和时间对其进行排序,以计算行中的时间差,然后再将其保存到新文件中。
  visitor         v_time payment items
1    Jack 1/2/2018 16:07      35     3
2    Jack 1/2/2018 16:09     160     1
3   David 1/2/2018 16:12      25     2
4    Kate 1/2/2018 16:16       3     3
5   David 1/2/2018 16:21      25     5
6    Jack 1/2/2018 16:32      85     5
7    Kate 1/2/2018 16:33     639     3
8    Jack 1/2/2018 16:55       6     2
分组和排序没问题。但它没有计算出时差,也没有保存文件。
visitor <- c("Jack", "Jack", "David", "Kate", "David", "Jack", "Kate", "Jack")
v_time <- c("1/2/2018 16:07","1/2/2018 16:09","1/2/2018 16:12","1/2/2018 16:16","1/2/2018 16:21","1/2/2018 16:32","1/2/2018 16:33", "1/2/2018 16:55")
payment <- c(35,160,25,3,25,85,639,6)
items <- c(3,1,2,3,5,5,3,2)
df <- data.frame(visitor, v_time, payment, items)
df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M")), diff_secs = as.numeric(diff, units = 'secs'))
write.csv(df,"C:/output.csv", row.names = F)
我的错误是什么,正确的做法是什么?
# A tibble: 8 x 6
# Groups: visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12   25.0   2.00 NA            NA
2 David   1/2/2018 16:21   25.0   5.00 NA            NA
3 Jack    1/2/2018 16:07   35.0   3.00 NA            NA
4 Jack    1/2/2018 16:09  160     1.00 NA            NA
5 Jack    1/2/2018 16:32   85.0   5.00 NA            NA
6 Jack    1/2/2018 16:55    6.00  2.00 NA            NA
7 Kate    1/2/2018 16:16    3.00  3.00 NA            NA
8 Kate    1/2/2018 16:33  639     3.00 NA            NA
当您添加default = strptime(v_time, "%d/%m/%Y %H:%M")[1]到lag零件时:
df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]),
         diff_secs = as.numeric(diff, units = 'secs'))
你会得到你期望的结果:
Run Code Online (Sandbox Code Playgroud)> df # A tibble: 8 x 6 # Groups: visitor [3] visitor v_time payment items diff diff_secs <fct> <fct> <dbl> <dbl> <time> <dbl> 1 David 1/2/2018 16:12 25. 2. 0 0. 2 David 1/2/2018 16:21 25. 5. 540 540. 3 Jack 1/2/2018 16:07 35. 3. 0 0. 4 Jack 1/2/2018 16:09 160. 1. 120 120. 5 Jack 1/2/2018 16:32 85. 5. 1380 1380. 6 Jack 1/2/2018 16:55 6. 2. 1380 1380. 7 Kate 1/2/2018 16:16 3. 3. 0 0. 8 Kate 1/2/2018 16:33 639. 3. 1020 1020.
另一种选择是使用difftime:
df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = difftime(strptime(v_time, "%d/%m/%Y %H:%M"), lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]), units = 'mins'),
         diff_secs = as.numeric(diff, units = 'secs'))
现在diff-column 以分钟diff_sec为单位,-column 以秒为单位:
Run Code Online (Sandbox Code Playgroud)> df # A tibble: 8 x 6 # Groups: visitor [3] visitor v_time payment items diff diff_secs <fct> <fct> <dbl> <dbl> <time> <dbl> 1 David 1/2/2018 16:12 25. 2. 0 0. 2 David 1/2/2018 16:21 25. 5. 9 540. 3 Jack 1/2/2018 16:07 35. 3. 0 0. 4 Jack 1/2/2018 16:09 160. 1. 2 120. 5 Jack 1/2/2018 16:32 85. 5. 23 1380. 6 Jack 1/2/2018 16:55 6. 2. 23 1380. 7 Kate 1/2/2018 16:16 3. 3. 0 0. 8 Kate 1/2/2018 16:33 639. 3. 17 1020.
您现在可以再次保存结果 write.csv(df,"C:/output.csv", row.names = FALSE)
| 归档时间: | 
 | 
| 查看次数: | 1017 次 | 
| 最近记录: |