将出版日期中的列更改为r中的年龄

mon*_*nes 11 r date-of-birth

我是第一次使用data.table.

我的桌子上有一个大约400,000年的专栏.我需要将它们从出生日期转换为年龄.

做这个的最好方式是什么?

Gre*_*gor 24

我一直在考虑这个问题,到目前为止对这两个答案一直不满意.我喜欢使用lubridate,正如@KFB所做的那样,但我也希望在函数中很好地包装好东西,就像我使用eeptools包的答案一样.所以这里是一个使用lubridate区间方法的包装函数,有一些不错的选项:

#' Calculate age
#' 
#' By default, calculates the typical "age in years", with a
#' \code{floor} applied so that you are, e.g., 5 years old from
#' 5th birthday through the day before your 6th birthday. Set
#' \code{floor = FALSE} to return decimal ages, and change \code{units}
#' for units other than years.
#' @param dob date-of-birth, the day to start calculating age.
#' @param age.day the date on which age is to be calculated.
#' @param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}.
#' @param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}.
#' @return Age in \code{units}. Will be an integer if \code{floor = TRUE}.
#' @examples
#' my.dob <- as.Date('1983-10-20')
#' age(my.dob)
#' age(my.dob, units = "minutes")
#' age(my.dob, floor = FALSE)
age <- function(dob, age.day = today(), units = "years", floor = TRUE) {
    calc.age = interval(dob, age.day) / duration(num = 1, units = units)
    if (floor) return(as.integer(floor(calc.age)))
    return(calc.age)
}
Run Code Online (Sandbox Code Playgroud)

用法示例:

> my.dob <- as.Date('1983-10-20')

> age(my.dob)
[1] 31

> age(my.dob, floor = FALSE)
[1] 31.15616

> age(my.dob, units = "minutes")
[1] 16375680

> age(seq(my.dob, length.out = 6, by = "years"))
[1] 31 30 29 28 27 26
Run Code Online (Sandbox Code Playgroud)

  • 这有关于生日的问题。例如,`age(dob = as.Date("1970-06-01"), age.day = as.Date("2018-05-31"))`(人48岁生日的前一天)应该返回47,但它返回 48(48.03014 with `floor = FALSE`)。必须有一个更简洁的方法,但是 `as.numeric(as.period(interval(as.Date("1970-06-01"), as.Date("2018-05-31"))), "years" )` 似乎更好(它返回 47.9988) (3认同)

Gre*_*gor 23

这篇博客文章的评论中,我发现age_calceeptools包中的功能.它处理边缘情况(闰年等),检查输入并且看起来非常稳健.

library(eeptools)
x <- as.Date(c("2011-01-01", "1996-02-29"))
age_calc(x[1],x[2]) # default is age in months
Run Code Online (Sandbox Code Playgroud)

[1] 46.73333 224.83118

age_calc(x[1],x[2], units = "years") # but you can set it to years
Run Code Online (Sandbox Code Playgroud)

[1] 3.893151 18.731507

floor(age_calc(x[1],x[2], units = "years"))
Run Code Online (Sandbox Code Playgroud)

[1] 3 18

对于您的数据

yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years"))
Run Code Online (Sandbox Code Playgroud)

假设你想要整数年的年龄.


KFB*_*KFB 7

假设您有一个 data.table,您可以执行以下操作:

library(data.table)
library(lubridate)
# toy data
X = data.table(birth=seq(from=as.Date("1970-01-01"), to=as.Date("1980-12-31"), by="year"))
Sys.Date()
Run Code Online (Sandbox Code Playgroud)

选项 1:使用 lubriate 包中的“as.period”

X[, age := as.period(Sys.Date() - birth)][]
         birth                   age
 1: 1970-01-01  44y 0m 327d 0H 0M 0S
 2: 1971-01-01  43y 0m 327d 6H 0M 0S
 3: 1972-01-01 42y 0m 327d 12H 0M 0S
 4: 1973-01-01 41y 0m 326d 18H 0M 0S
 5: 1974-01-01  40y 0m 327d 0H 0M 0S
 6: 1975-01-01  39y 0m 327d 6H 0M 0S
 7: 1976-01-01 38y 0m 327d 12H 0M 0S
 8: 1977-01-01 37y 0m 326d 18H 0M 0S
 9: 1978-01-01  36y 0m 327d 0H 0M 0S
10: 1979-01-01  35y 0m 327d 6H 0M 0S
11: 1980-01-01 34y 0m 327d 12H 0M 0S
Run Code Online (Sandbox Code Playgroud)

选项 2 :如果您不喜欢选项 1 的格式,您可以执行以下操作:

yr = duration(num = 1, units = "years")
X[, age := new_interval(birth, Sys.Date())/yr][]
# you get
         birth      age
 1: 1970-01-01 44.92603
 2: 1971-01-01 43.92603
 3: 1972-01-01 42.92603
 4: 1973-01-01 41.92329
 5: 1974-01-01 40.92329
 6: 1975-01-01 39.92329
 7: 1976-01-01 38.92329
 8: 1977-01-01 37.92055
 9: 1978-01-01 36.92055
10: 1979-01-01 35.92055
11: 1980-01-01 34.92055
Run Code Online (Sandbox Code Playgroud)

相信选项 2 应该是更可取的。