我想用dplyr解决以下问题.优选具有一个窗口功能.我有一个房屋和购买价格的数据框架.以下是一个例子:
houseID year price
1 1995 NA
1 1996 100
1 1997 NA
1 1998 120
1 1999 NA
2 1995 NA
2 1996 NA
2 1997 NA
2 1998 30
2 1999 NA
3 1995 NA
3 1996 44
3 1997 NA
3 1998 NA
3 1999 NA
Run Code Online (Sandbox Code Playgroud)
我想建立一个这样的数据框:
houseID year price
1 1995 NA
1 1996 100
1 1997 100
1 1998 120
1 1999 120
2 1995 NA
2 1996 NA
2 1997 NA
2 1998 30
2 1999 30
3 1995 NA
3 1996 44
3 1997 44
3 1998 44
3 1999 44
Run Code Online (Sandbox Code Playgroud)
以下是正确格式的一些数据:
# Number of houses
N = 15
# Data frame
df = data.frame(houseID = rep(1:N,each=10), year=1995:2004, price =ifelse(runif(10*N)>0.15, NA,exp(rnorm(10*N))))
Run Code Online (Sandbox Code Playgroud)
有没有一个dplyr方式来做到这一点?
ali*_*ire 63
tidyr::fill 现在让这很简单:
library(dplyr)
library(tidyr)
# or library(tidyverse)
df %>% group_by(houseID) %>% fill(price)
# Source: local data frame [15 x 3]
# Groups: houseID [3]
#
# houseID year price
# (int) (int) (int)
# 1 1 1995 NA
# 2 1 1996 100
# 3 1 1997 100
# 4 1 1998 120
# 5 1 1999 120
# 6 2 1995 NA
# 7 2 1996 NA
# 8 2 1997 NA
# 9 2 1998 30
# 10 2 1999 30
# 11 3 1995 NA
# 12 3 1996 44
# 13 3 1997 44
# 14 3 1998 44
# 15 3 1999 44
Run Code Online (Sandbox Code Playgroud)
G. *_*eck 44
这些都na.locf来自动物园包:
dplyr
library(dplyr)
library(zoo)
na.locf2 <- function(x) na.locf(x, na.rm = FALSE)
df %>% group_by(houseID) %>% do(na.locf2(.)) %>% ungroup
Run Code Online (Sandbox Code Playgroud)
赠送:
Source: local data frame [15 x 3]
Groups: houseID
houseID year price
1 1 1995 NA
2 1 1996 100
3 1 1997 100
4 1 1998 120
5 1 1999 120
6 2 1995 NA
7 2 1996 NA
8 2 1997 NA
9 2 1998 30
10 2 1999 30
11 3 1995 NA
12 3 1996 44
13 3 1997 44
14 3 1998 44
15 3 1999 44
Run Code Online (Sandbox Code Playgroud)
下面的其他解决方案给出了非常相似的输出,因此我们不会重复它,除非格式大不相同.
另一种可能性是将na.locf0解决方案(下面进一步显示)与dplyr 结合起来:
df %>% group_by(houseID) %>% mutate(price = na.locf0(price)) %>% ungroup
Run Code Online (Sandbox Code Playgroud)
通过
df %>% by(df$houseID, na.locf2) %>% bind_rows
Run Code Online (Sandbox Code Playgroud)
AVE
library(zoo)
do.call(rbind, by(df, df$houseID, na.locf2))
Run Code Online (Sandbox Code Playgroud)
data.table
library(zoo)
transform(df, price = ave(price, houseID, FUN = na.locf0))
Run Code Online (Sandbox Code Playgroud)
zoo此解决方案仅使用动物园.它返回一个宽而不是长的结果:
library(data.table)
library(zoo)
data.table(df)[, na.locf2(.SD), by = houseID]
Run Code Online (Sandbox Code Playgroud)
赠送:
library(zoo)
z <- read.zoo(df, index = 2, split = 1, FUN = identity)
na.locf2(z)
Run Code Online (Sandbox Code Playgroud)
这个解决方案可以和dplyr结合使用,如下所示:
1 2 3
1995 NA NA NA
1996 100 NA 44
1997 100 NA 44
1998 120 30 44
1999 120 30 44
Run Code Online (Sandbox Code Playgroud)
输入
以下是上述示例的输入:
library(dplyr)
library(zoo)
df %>% read.zoo(index = 2, split = 1, FUN = identity) %>% na.locf2
Run Code Online (Sandbox Code Playgroud)
修订重新安排并添加更多解决方案.修订了dplyr/zoo解决方案以符合最新的更改dplyr.
ili*_*lir 13
您可以执行滚动自加入,支持者data.table:
require(data.table)
setDT(df) ## change it to data.table in place
setkey(df, houseID, year) ## needed for fast join
df.woNA <- df[!is.na(price)] ## version without the NA rows
# rolling self-join will return what you want
df.woNA[df, roll=TRUE] ## will match previous year if year not found
Run Code Online (Sandbox Code Playgroud)
纯dplyr解决方案(没有动物园).
df %>%
group_by(houseID) %>%
mutate(price_change = cumsum(0 + !is.na(price))) %>%
group_by(price_change, add = TRUE) %>%
mutate(price_filled = nth(price, 1)) %>%
ungroup() %>%
select(-price_change) -> df2
Run Code Online (Sandbox Code Playgroud)
有趣的示例解决方案部分是在df2的末尾.
> tail(df2, 20)
Source: local data frame [20 x 4]
houseID year price price_filled
1 14 1995 NA NA
2 14 1996 NA NA
3 14 1997 NA NA
4 14 1998 NA NA
5 14 1999 0.8374778 0.8374778
6 14 2000 NA 0.8374778
7 14 2001 NA 0.8374778
8 14 2002 NA 0.8374778
9 14 2003 2.1918880 2.1918880
10 14 2004 NA 2.1918880
11 15 1995 NA NA
12 15 1996 0.3982450 0.3982450
13 15 1997 NA 0.3982450
14 15 1998 1.7727000 1.7727000
15 15 1999 NA 1.7727000
16 15 2000 NA 1.7727000
17 15 2001 NA 1.7727000
18 15 2002 7.8636329 7.8636329
19 15 2003 NA 7.8636329
20 15 2004 NA 7.8636329
Run Code Online (Sandbox Code Playgroud)