Ari*_*ant 0 r dataframe dplyr data.table tidyr
我有一个带有值列和相应年份的数据框。我想创建一个额外的列,其中应包含以5年为间隔的年份的价值比率,向后倒退。例如 如果年份是2000,则'newval'列应具有2000和1995年的值比率。我的数据框如下所示。可能缺少年份,并且“值”和“年份”列中都没有数据。
df2 = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT","AUT"),
val = c(123, 42, 23, 5, 42, 4, 23, 25, 42, 23, NA, 5563,56),
Year = c(1990, 1991, 1992, 1993, 1991, 1995, 1996, 1997, 1991, 1992, 2000, 2001,2002))
Run Code Online (Sandbox Code Playgroud)
最终数据集应如下所示
df2 = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT","AUT"),
val= c(123, 42, 23, 5, 42, 4, 23, 25, 42, 23, NA, 5563,56),
Year = c(1990, 1991, 1992, 1993, 1991, 1995, 1996, 1997, 1991, 1992, 2000, 2001,2002), newval=c(NA,NA,NA,NA,NA,0.032520325,0.547619048,1.086956522,NA,NA,NA,241.8695652,2.24))
Run Code Online (Sandbox Code Playgroud)
在基数R中,我们可以使用 match
df2$new_val <- with(df2, val/val[match(Year - 5, Year)])
df2
# code val Year new_val
#1 AFG 123 1990 NA
#2 AGO 42 1991 NA
#3 ALB 23 1992 NA
#4 AND 5 1993 NA
#5 ARB 42 1991 NA
#6 ARE 4 1995 0.0325
#7 ARG 23 1996 0.5476
#8 ARM 25 1997 1.0870
#9 ASM 42 1991 NA
#10 ATG 23 1992 NA
#11 AUS NA 2000 NA
#12 AUT 5563 2001 241.8696
#13 AUT 56 2002 2.2400
Run Code Online (Sandbox Code Playgroud)