如何在累积总和图上的常规时间点获得插值的斜率?

Far*_*rel 10 interpolation r spline

在交叉处验证我问一个问题,有关日期分析数据,但不希望由每月分档数据产生虚假尖峰和低谷.例如,如果一个人在每个月的最后一天支付账单,但有一次支付了几天的延迟,那么一个月将反映零费用,而下一个月将反映通常费用的两倍.所有虚假的垃圾.

我的问题的答案之一解释了使用线性样条平滑对累积和进行插值的概念,以克服分档中的打嗝.我对它感兴趣,并希望在R中实现它,但无法在线找到任何示例.我不只是想打印情节.我希望在每个时间点(可能每天)获得瞬时斜率,但该斜率应该来自样条,该样条曲线从几天(或者几周或几个月)之前输入几天到几天之间在时间点之后.换句话说,在一天结束时,我想得到一些数据框,其中一列是每天的钱,或者是每周的病人,但这不会发生变幻莫测,例如我是否支付了几天的费用或是否本月恰好有5个工作日(而不是通常的4天).

这是一些简化的模拟和绘图,以显示我的反对意见.

library(lubridate)
library(ggplot2)
library(reshape2)
dates <- seq(as.Date("2010-02-01"), length=24, by="1 month") - 1
dates[5] <- dates[5]+3 #we are making one payment date that is 3 days late
dates#look how the payment date is the last day of every month except for
#2010-05 where it takes place on 2010-06-03 - naughty boy!
amounts <- rep(50,each=24)# pay $50 every month
register <- data.frame(dates,amounts)#this is the starting register or ledger
ggplot(data=register,aes(dates,amounts))+geom_point()#look carefully and you will see that 2010-05 has no dots in it and 2010-06 has two dots
register.by.month <- ddply(register,.(y=year(dates),month=month(dates)),summarise,month.tot=sum(amounts))#create a summary of totals by month but it lands up omiting a month in which nothing happened. Further badness is that it creates a new dataframe where one is not needed. Instead I created a new variable that allocates each date into a particular "zone" such as month or 
register$cutmonth <- as.Date(cut(register$dates, breaks = "month"))#until recently I did not know that the cut function can handle dates
table(register$cutmonth)#see how there are two payments in the month of 2010-06
#now lets look at what we paid each month. What is the total for each month
ggplot(register, aes(cutmonth, amounts))+ stat_summary(fun.y = sum, geom = "bar")#that is the truth but it is a useless truth
Run Code Online (Sandbox Code Playgroud)

如果一个人迟到了几天的付款,看起来好像费用在一个月内为零,在下一个月加倍. 那是假的

#so lets use cummulated expense over time
register$cumamount <- cumsum(register$amounts)
cum <- ggplot(data=register,aes(dates,cumamount))+geom_point()
cum+stat_smooth()
Run Code Online (Sandbox Code Playgroud)

累积金额随着时间的推移平滑了变化项目箱的可变性

#That was for everything the same every month, now lets introduce a situation where there is a trend that in the second year the amounts start to go up, 
increase <- c(rep(1,each=12),seq(from=1.01,to=1.9,length.out=12))
amounts.up <- round(amounts*increase,digits=2)#this is the monthly amount with a growth of amount in each month of the second year
register <- cbind(register,amounts.up)#add the variable to the data frarme
register$cumamount.up <- cumsum(register$amounts.up) #work out th cumulative sum for the new scenario
ggplot(data=register,aes(x=dates))+
   geom_point(aes(y=amounts, colour="amounts",shape="amounts"))+
   geom_point(aes(y=amounts.up, colour="amounts.up",shape="amounts.up"))# the plot of amount by date
#I am now going to plot the cumulative amount over time but now that I have two scenarios it is easier to deal with the data frame in long format (melted) rather than wide format (casted)
#before I can melt, the reshape2 package unforutnately can't handle date class so will have to turn them int o characters and then back again.
register[,c("dates","cutmonth")] <- lapply(register[,c("dates","cutmonth")],as.character)
register.long <- melt.data.frame(register,measure.vars=c("amounts","amounts.up"))
register.long[,c("dates","cutmonth")] <- lapply(register.long[,c("dates","cutmonth")],as.Date)
ggplot(register.long, aes(cutmonth,value))+ stat_summary(fun.y = sum, geom = "bar")+facet_grid(. ~ variable) #that is the truth but it is a useless truth, 
cum <- ggplot(data=register,aes(dates,cumamount))+geom_point()
#that is the truth but it is a useless truth. Furthermore it appears as if 2010-06 is similar to what is going on in 2011-12
#that is patently absurd. All that happened was that the 2010-05 payment was delayed by 3 days.
Run Code Online (Sandbox Code Playgroud)

两种情况,但显示每月支付的金额

#so lets use cummulated expense over time    
ggplot(data=register.long,aes(dates,c(cumamount,cumamount.up)))+geom_point() + scale_y_continuous(name='cumulative sum of amounts ($)')
Run Code Online (Sandbox Code Playgroud)

在这里,我们看到两种情景的累积和数据

所以对于简单的情节,变量interpolate.daily将是一年中每天约50美元/ 30.4 = 1.64美元.对于第二个情节,第二年每月支付的金额开始上升,第一年每天的每日费率为1.64美元,而第二年的日期则为每日费率.逐渐从每天1.64美元增加到每天3.12美元左右.

非常感谢你阅读这一直到最后.你一定和我一样好奇!

Joh*_*lby 1

这是一种基本方法。当然,还有更复杂的选项和需要调整的参数,但这应该是一个很好的起点。

dates <- seq(as.Date("2010-02-01"), length=24, by="1 month") - 1
dates[5] <- dates[5]+3
amounts <- rep(50,each=24)
increase <- c(rep(1,each=12),seq(from=1.01,to=1.9,length.out=12))
amounts.up <- round(amounts*increase,digits=2)

df = data.frame(dates=dates, cumamount.up=cumsum(amounts.up))

df.spline = splinefun(df$dates, df$cumamount.up)

newdates = seq(min(df$dates), max(df$dates), by=1)
money.per.day = df.spline(newdates, deriv=1)
Run Code Online (Sandbox Code Playgroud)

如果将其绘制出来,您可以看到样条曲线的有趣行为:

plot(newdates, money.per.day, type='l')
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述