I am studying the price of a product along time. I have daily data with some missing info at random.
See here a minimal example where info for the 4th of January is missing:
library(lubridate)
library(data.table)
mockData <- data.table(timeStamp=c(ymd("20180101"), ymd("20180102"), ymd("20180103"), ymd("20180105")),
price=c(10,15,12,11))
Run Code Online (Sandbox Code Playgroud)
I want to add the lagged price to my data.table but if the previous day is missing, I want a NA instead of the closest day with info.
I explain myself:
If I use the shift function:
mockData[, lag_price:=shift(price,type="lag")]
I get:
structure(list(timeStamp = structure(c(17532, 17533, 17534, 17536
), class = "Date"), price = c(10, 15, 12, 11), lag_price = c(NA,
10, 15, 12)), row.names = c(NA, -4L), class = c("data.table",
"data.frame"))
Run Code Online (Sandbox Code Playgroud)
But what I really want is this:
structure(list(timeStamp = structure(c(17532, 17533, 17534, 17536
), class = "Date"), price = c(10, 15, 12, 11), lag_price = c(NA,
10, 15, NA)), row.names = c(NA, -4L), class = c("data.table",
"data.frame"))
Run Code Online (Sandbox Code Playgroud)
I fell more comfortable using data.table but I will work with data.frame, dplyr and tidyverse if required
You could add an ifelse statement to check for consecutive days
mockData[, lag_price := ifelse(timeStamp - shift(timeStamp) == 1, shift(price), NA)]
# timeStamp price lag_price
#1: 2018-01-01 10 NA
#2: 2018-01-02 15 10
#3: 2018-01-03 12 15
#4: 2018-01-05 11 NA
Run Code Online (Sandbox Code Playgroud)