与as.POSIXct相比,为什么lubridate函数如此慢?

RJ-*_*RJ- 22 r lubridate

正如标题所说.为什么润滑剂的功能要慢得多?

library(lubridate)
library(microbenchmark)

Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 50000, replace = TRUE)

microbenchmark(as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT"), times = 100)
microbenchmark(dmy(Dates, tz ="GMT"), times = 100)

Unit: milliseconds
expr                                                            min         lq          median      uq          max
1 as.POSIXct(Dates, format = "%d-%b-%Y %H:%M:%S", tz = "GMT")   103.1902    104.3247    108.675     109.2632    149.871
2 dmy(Dates, tz = "GMT")                                        184.4871    194.1504    197.8422    214.3771    268.4911
Run Code Online (Sandbox Code Playgroud)

Tyl*_*ker 43

出于同样的原因,与在火箭上骑行相比,汽车速度较慢.增加的易用性和安全性使得汽车比火箭慢得多,但是你不太可能被炸毁,而且更容易启动,转向和制动汽车.然而,在正确的情况下(例如,我需要登月)火箭是适合这项工作的工具.现在,如果有人发明了一辆带有火箭绑在屋顶上的汽车,我们就会有所收获.

从查看dmy正在进行的操作开始,您将看到速度的差异(从您的bechmarks开始,我不会说lubridate那么慢,因为它们以毫秒为单位):

dmy #type this进入命令行,你得到:

>dmy
function (..., quiet = FALSE, tz = "UTC") 
{
    dates <- unlist(list(...))
    parse_date(num_to_date(dates), make_format("dmy"), quiet = quiet, 
        tz = tz)
}
<environment: namespace:lubridate>
Run Code Online (Sandbox Code Playgroud)

马上我看到parse_datenum_to_datemake_format.让人怀疑所有这些人是什么.让我们来看看:

parse_date

> parse_date
function (x, formats, quiet = FALSE, seps = find_separator(x), 
    tz = "UTC") 
{
    fmt <- guess_format(head(x, 100), formats, seps, quiet)
    parsed <- as.POSIXct(strptime(x, fmt, tz = tz))
    if (length(x) > 2 & !quiet) 
        message("Using date format ", fmt, ".")
    failed <- sum(is.na(parsed)) - sum(is.na(x))
    if (failed > 0) {
        message(failed, " failed to parse.")
    }
    parsed
}
<environment: namespace:lubridate>
Run Code Online (Sandbox Code Playgroud)

num_to_date

> getAnywhere(num_to_date)
A single object matching ‘num_to_date’ was found
It was found in the following places
  namespace:lubridate
with value

function (x) 
{
    if (is.numeric(x)) {
        x <- as.character(x)
        x <- paste(ifelse(nchar(x)%%2 == 1, "0", ""), x, sep = "")
    }
    x
}
<environment: namespace:lubridate>
Run Code Online (Sandbox Code Playgroud)

make_format

> getAnywhere(make_format)
A single object matching ‘make_format’ was found
It was found in the following places
  namespace:lubridate
with value

function (order) 
{
    order <- strsplit(order, "")[[1]]
    formats <- list(d = "%d", m = c("%m", "%b"), y = c("%y", 
        "%Y"))[order]
    grid <- expand.grid(formats, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
    lapply(1:nrow(grid), function(i) unname(unlist(grid[i, ])))
}
<environment: namespace:lubridate>
Run Code Online (Sandbox Code Playgroud)

哇,我们得到strsplit-ting,expand-ing.grid-s,paste-ing,ifelse-ing,unname-ing等加一个整体Lotta错误检查回事(在泽普歌曲播放).所以我们这里有一些很好的语法糖.嗯好吃,但它有价格,速度.

比较一下 as.POSIXct:

getAnywhere(as.POSIXct)  #tells us to use methods to see the business
methods('as.POSIXct')    #tells us all the business
as.POSIXct.date          #what I believe your code is using (I don't use dates though)
Run Code Online (Sandbox Code Playgroud)

有更多的内部编码和更少的错误检查正在进行as.POSIXct 所以你必须要问我是否想要轻松和安全或速度和功率?取决于工作.

  • +1很棒的答案.另外,您是否注意到`parse_date()`本身调用`as.POSIXct()`?所以最后,`dmy()`汽车引擎盖下有一个`as.POSIXct()`引擎. (7认同)
  • 我认为它实际上是使用`as.POSIXct.default`来处理一个字符参数(`Dates`是一个字符向量). (2认同)

c.g*_*rez 12

@ Tyler的回答是正确的.这里有一些更多的信息,包括关于使lubridate更快的提示 - 从帮助文件:

"Lubridate有一个内置的非常快的POSIX解析器,从Simon Urbanek的快速包中移植.这个功能是可选的,可以通过选项激活(lubridate.fasttime = TRUE).Lubridate将自动检测POSIX字符串并使用快速解析器代替默认的strptime实用程序."