我正在尝试使用ddply清理数据,但它在1.3M行上运行速度非常慢.
示例代码:
#Create Sample Data Frame
num_rows <- 10000
df <- data.frame(id=sample(1:20, num_rows, replace=T),
Consumption=sample(-20:20, num_rows, replace=T),
StartDate=as.Date(sample(15000:15020, num_rows, replace=T), origin = "1970-01-01"))
df$EndDate <- df$StartDate + 90
#df <- df[order(df$id, df$StartDate, df$Consumption),]
#Are values negative?
# Needed for subsetting in ddply rows with same positive and negative values
df$Neg <- ifelse(df$Consumption < 0, -1, 1)
df$Consumption <- abs(df$Consumption)
Run Code Online (Sandbox Code Playgroud)
我编写了一个函数来删除行,其中一行中的消耗值相同但对另一行中的消耗值为负(对于相同的id).
#Remove rows from a data frame where there is an equal but opposite consumption value
#Should ensure only one negative …Run Code Online (Sandbox Code Playgroud)