R 在 1:36:14 而不是在 2:00:00 从 PDT 切换到 PST - Lubridate 在切换前分配时区

Col*_*ens 13 datetime r lubridate

当查看与从 PDT 到 PST 的时区更改重叠的日期时间值时,R 似乎在 1:36:14 而不是预期的 2:00:00 切换时区。具体来说,R 将 PST 时区分配给 2021-11-07 01:36:14 之后的所有日期时间(如下所示):

x <-c(
    "2021-11-07 1:00:00",
    "2021-11-07 1:00:01",
    "2021-11-07 1:35:00",
    "2021-11-07 1:36:00",
    "2021-11-07 1:36:10",
    "2021-11-07 1:36:14",
    "2021-11-07 1:36:15",
    "2021-11-07 1:36:30",
    "2021-11-07 1:36:59",
    "2021-11-07 1:45:00",
    "2021-11-07 1:59:59",
    "2021-11-07 2:00:00",
    "2021-11-07 2:30:00"
    )
x_pst <- as.POSIXct(x, tz = "PST8PDT")
> x_pst
# ...
[5] "2021-11-07 01:36:10 PDT" "2021-11-07 01:36:14 PDT"
[7] "2021-11-07 01:36:15 PST" "2021-11-07 01:36:30 PST"
# ...

Run Code Online (Sandbox Code Playgroud)

除此之外,lubridate 似乎在切换之前将所有日期时间调整为 PST(使用相同的数据):

x_pst <- lubridate::as_datetime(x, tz = "PST8PDT")
> x_pst
[1] "2021-11-07 01:00:00 PST" "2021-11-07 01:00:01 PST"
[3] "2021-11-07 01:35:00 PST" "2021-11-07 01:36:00 PST"
[5] "2021-11-07 01:36:10 PST" "2021-11-07 01:36:14 PST"
[7] "2021-11-07 01:36:15 PST" "2021-11-07 01:36:30 PST"
[9] "2021-11-07 01:36:59 PST" "2021-11-07 01:45:00 PST"
[11] "2021-11-07 01:59:59 PST" "2021-11-07 02:00:00 PST"
[13] "2021-11-07 02:30:00 PST"

x_pst <- lubridate::ymd_hms(x, tz = "PST8PDT")
> x_pst
# same output as above
Run Code Online (Sandbox Code Playgroud)

那么为什么时区会在如此特定的时间切换,lubridate 通过将 PST 分配给更改之前的所有日期时间来做什么呢?

会议信息:

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.0

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: US/Pacific
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.3.1   generics_0.1.3   tools_4.3.1     
[4] lubridate_1.9.3  timechange_0.2.0
Run Code Online (Sandbox Code Playgroud)

Jos*_*ood 7

这不是一个完整的答案,但我希望有更多专业知识的人可以以此为基础。

as.POSIXct

在进入代码之前,我首先想提供一些背景信息。as.POSIXct我们从定义了多个方法的通用函数开始S3

as.POSIXct
#> function (x, tz = "", ...)
#> UseMethod("as.POSIXct")

methods(as.POSIXct)
#> [1] as.POSIXct.Date    as.POSIXct.default as.POSIXct.numeric as.POSIXct.POSIXlt
#> see '?methods' for accessing help and source code
Run Code Online (Sandbox Code Playgroud)

对于OP给出的示例,由于我们正在处理字符数据类型,因此我们将使用以下default方法:

as.POSIXct.default
#> function (x, tz = "", ...)
#> {
#>     if (inherits(x, "POSIXct"))
#>         return(if (missing(tz)) x else .POSIXct(x, tz))
#>     if (is.null(x))
#>         return(.POSIXct(numeric(), tz))
#>     if (is.character(x) || is.factor(x))
#>         return(as.POSIXct(as.POSIXlt(x, tz, ...), tz, ...))
#>     if (is.logical(x) && all(is.na(x)))
#>         return(.POSIXct(as.numeric(x), tz))
#>     stop(gettextf("do not know how to convert '%s' to class %s",
#>         deparse1(substitute(x)), dQuote("POSIXct")), domain = NA)
#> }
Run Code Online (Sandbox Code Playgroud)

这让我们调用as.POSIXlt(上面的第三个条件),一个恰好有一个字符S3方法的泛型函数:as.POSIXlt.character。我不会粘贴源代码,但该函数的核心是strptime

strptime
#> function (x, format, tz = "")
#> .Internal(strptime(if (is.character(x)) x else if (is.object(x)) `names<-`(as.character(x),
#>     names(x)) else `storage.mode<-`(x, "character"), format, tz))
Run Code Online (Sandbox Code Playgroud)

您可以在此处C查看代码。我最初尝试逻辑地遵循代码,但事实证明这非常困难。

RApiDatetime

幸运的是,有一个包RApiDatetime(感谢 Dirk!),它具有以下功能:RApiDatetime::rapistrptime。根据 OP 提供的值调用它:

RApiDatetime::rapistrptime(x, fmt = "%Y-%m-%d %H:%M:%OS", "PST8PDT")
#> $sec
#>  [1]  0  1  0  0 10 14 15 30 59  0 59  0  0
#>
#> $min
#>  [1]  0  0 35 36 36 36 36 36 36 45 59  0 30
#>
#> $hour
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 2 2
#>
#> $mday
#>  [1] 7 7 7 7 7 7 7 7 7 7 7 7 7
#>
#> $mon
#>  [1] 10 10 10 10 10 10 10 10 10 10 10 10 10
#>
#> $year
#>  [1] 121 121 121 121 121 121 121 121 121 121 121 121 121
#>
#> $wday
#>  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> $yday
#>  [1] 310 310 310 310 310 310 310 310 310 310 310 310 310
#>
#> $isdst
#>  [1] 1 1 1 1 1 1 0 0 0 0 0 0 0
#>
#> $zone
#>  [1] "PDT" "PDT" "PDT" "PDT" "PDT" "PDT" "PST" "PST" "PST" "PST" "PST" "PST" "PST"
#>
#> $gmtoff
#>  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
#>
#> attr(,"class")
#> [1] "POSIXlt" "POSIXt"
#> attr(,"tzone")
#> [1] "PST8PDT" "PST"     "PDT"
Run Code Online (Sandbox Code Playgroud)

我们认为这个isdst领域看起来值得研究。克隆存储库并粗略使用后printf,我可以更轻松地遵循该路径。我们发现,背后真正的动作isdist就发生在这里

.
.
    OK = tm->tm_year < 138 && tm->tm_year >= (have_broken_mktime() ? 70 : 02);
    if(OK) {
    res = (double) mktime(tm);
    if (res == -1.) return res;
.
.
Run Code Online (Sandbox Code Playgroud)

mktime

最后我们在评论中得到了我的主张mktime

我写了这个非常简单的C++函数来看看调用后我们的结构会发生什么mktime

#include <Rcpp.h>
using namespace Rcpp;

#include <time.h>
#include <stdio.h>

// [[Rcpp::export]]
void CheckMkTime(int tm_sec) {
    struct tm info;

    info.tm_sec = tm_sec;
    info.tm_min = 36;
    info.tm_hour = 1;
    info.tm_mday = 7;
    info.tm_mon = 10;
    info.tm_year = 121;
    info.tm_wday = 0;
    info.tm_yday = 310;
    info.tm_isdst = -1;

    time_t val = mktime(&info);
    printf("mktime_res: %jd,\n tm_zone: %s,\n tm_gmtoff: %ld,\n tm_sec: %d,\n "
           "tm_min: %d,\n tm_hour: %d,\n tm_mday: %d,\n tm_mon: %d,\n "
           "tm_year: %d,\n tm_wday: %d,\n tm_yday: %d,\n tm_isdst: %d,\n",
           val,
           info.tm_zone,
           info.tm_gmtoff,
           info.tm_sec,
           info.tm_min,
           info.tm_hour,
           info.tm_mday,
           info.tm_mon,
           info.tm_year,
           info.tm_wday,
           info.tm_yday,
           info.tm_isdst);
}
Run Code Online (Sandbox Code Playgroud)

并调用它tm_sec = 14我们有:

CheckMkTime(14)
#> mktime_res: 1636274174,
#>  tm_zone: PDT,
#>  tm_gmtoff: -25200,
#>  tm_sec: 14,
#>  tm_min: 36,
#>  tm_hour: 1,
#>  tm_mday: 7,
#>  tm_mon: 10,
#>  tm_year: 121,
#>  tm_wday: 0,
#>  tm_yday: 310,
#>  tm_isdst: 1,
Run Code Online (Sandbox Code Playgroud)

我们看到tm_sec = 15

CheckMkTime(15)
#> mktime_res: 1636277775,
#>  tm_zone: PST,
#>  tm_gmtoff: -28800,
#>  tm_sec: 15,
#>  tm_min: 36,
#>  tm_hour: 1,
#>  tm_mday: 7,
#>  tm_mon: 10,
#>  tm_year: 121,
#>  tm_wday: 0,
#>  tm_yday: 310,
#>  tm_isdst: 0,
Run Code Online (Sandbox Code Playgroud)

那么问题是mktime对的吗?

我不确定...

我写了纯C代码:

#include <time.h>
#include <stdio.h>

int main(void) {
    struct tm info;

    info.tm_sec = 14;
    info.tm_min = 36;
    info.tm_hour = 1;
    info.tm_mday = 7;
    info.tm_mon = 10;
    info.tm_year = 121;
    info.tm_wday = 0;
    info.tm_yday = 310;
    info.tm_isdst = -1;

    time_t val = mktime(&info);
    printf("mktime_res: %jd,\n tm_zone: %s,\n tm_gmtoff: %ld,\n tm_sec: %d,\n "
               "tm_min: %d,\n tm_hour: %d,\n tm_mday: %d,\n tm_mon: %d,\n "
               "tm_year: %d,\n tm_wday: %d,\n tm_yday: %d,\n tm_isdst: %d\n",
               val,
               info.tm_zone,
               info.tm_gmtoff,
               info.tm_sec,
               info.tm_min,
               info.tm_hour,
               info.tm_mday,
               info.tm_mon,
               info.tm_year,
               info.tm_wday,
               info.tm_yday,
               info.tm_isdst);


    struct tm info2;
    info2.tm_sec = 15;
    info2.tm_min = 36;
    info2.tm_hour = 1;
    info2.tm_mday = 7;
    info2.tm_mon = 10;
    info2.tm_year = 121;
    info2.tm_wday = 0;
    info2.tm_yday = 310;
    info2.tm_isdst = -1;

    val = mktime(&info2);
    printf("\n\nmktime_res: %jd,\n tm_zone: %s,\n tm_gmtoff: %ld,\n tm_sec: %d,\n "
               "tm_min: %d,\n tm_hour: %d,\n tm_mday: %d,\n tm_mon: %d,\n "
               "tm_year: %d,\n tm_wday: %d,\n tm_yday: %d,\n tm_isdst: %d\n",
               val,
               info2.tm_zone,
               info2.tm_gmtoff,
               info2.tm_sec,
               info2.tm_min,
               info2.tm_hour,
               info2.tm_mday,
               info2.tm_mon,
               info2.tm_year,
               info2.tm_wday,
               info2.tm_yday,
               info2.tm_isdst);

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

编译它,并在终端中运行它:

% clang time_shift.c -o time_shift
% ./time_shift
#> mktime_res: 1636266974,
#>  tm_zone: EST,
#>  tm_gmtoff: -18000,
#>  tm_sec: 14,
#>  tm_min: 36,
#>  tm_hour: 1,
#>  tm_mday: 7,
#>  tm_mon: 10,
#>  tm_year: 121,
#>  tm_wday: 0,
#>  tm_yday: 310,
#>  tm_isdst: 0
#>
#>
#> mktime_res: 1636266975,
#>  tm_zone: EST,
#>  tm_gmtoff: -18000,
#>  tm_sec: 15,
#>  tm_min: 36,
#>  tm_hour: 1,
#>  tm_mday: 7,
#>  tm_mon: 10,
#>  tm_year: 121,
#>  tm_wday: 0,
#>  tm_yday: 310,
#>  tm_isdst: 0
Run Code Online (Sandbox Code Playgroud)

我们在这里看不到问题。然而,我们确实注意到这两种情况下tm_zone都是EST,而当我们在运行R RApiDatetime::rapistrptime(x, fmt = "%Y-%m-%d %H:%M:%OS", "PST8PDT")运行它时,我们得到了PST14 和PDT15。

这样,我R在新的会话中重新运行这些示例,并获得了与纯C实现中相同的结果。

调用baseR后,我们在新会话中没有看到此行为。strptimeR

我尝试查看mktime源代码,但它超出了我的范围。

会议信息

% clang time_shift.c -o time_shift
% ./time_shift
#> mktime_res: 1636266974,
#>  tm_zone: EST,
#>  tm_gmtoff: -18000,
#>  tm_sec: 14,
#>  tm_min: 36,
#>  tm_hour: 1,
#>  tm_mday: 7,
#>  tm_mon: 10,
#>  tm_year: 121,
#>  tm_wday: 0,
#>  tm_yday: 310,
#>  tm_isdst: 0
#>
#>
#> mktime_res: 1636266975,
#>  tm_zone: EST,
#>  tm_gmtoff: -18000,
#>  tm_sec: 15,
#>  tm_min: 36,
#>  tm_hour: 1,
#>  tm_mday: 7,
#>  tm_mon: 10,
#>  tm_year: 121,
#>  tm_wday: 0,
#>  tm_yday: 310,
#>  tm_isdst: 0
Run Code Online (Sandbox Code Playgroud)

  • 伟大的!/sf/ask/599124361/ 有帮助吗? (2认同)