小编Osc*_*ith的帖子

如何通过使用data.table来提高当前使用ddply的数据清理代码的性能?

我正在尝试使用ddply清理数据,但它在1.3M行上运行速度非常慢.

示例代码:

#Create Sample Data Frame
num_rows <- 10000
df <- data.frame(id=sample(1:20, num_rows, replace=T), 
                Consumption=sample(-20:20, num_rows, replace=T), 
                StartDate=as.Date(sample(15000:15020, num_rows, replace=T), origin = "1970-01-01"))
df$EndDate <- df$StartDate + 90
#df <- df[order(df$id, df$StartDate, df$Consumption),]
#Are values negative? 
# Needed for subsetting in ddply rows with same positive and negative values
df$Neg <- ifelse(df$Consumption < 0, -1, 1)
df$Consumption <- abs(df$Consumption)
Run Code Online (Sandbox Code Playgroud)

我编写了一个函数来删除行,其中一行中的消耗值相同但对另一行中的消耗值为负(对于相同的id).

#Remove rows from a data frame where there is an equal but opposite consumption value
#Should ensure only one negative …
Run Code Online (Sandbox Code Playgroud)

performance r plyr data.table

3
推荐指数
1
解决办法
340
查看次数

标签 统计

data.table ×1

performance ×1

plyr ×1

r ×1