从字符串和文本数据中提取年份

use*_*187 4 regex r lubridate stringi

我需要从具有这些性质的向量中提取开始年和结束年。

 yr<- c("June 2013 – Present (2 years 9 months)", "January 2012 – June 2013 (1 year 6 months)","2006 – Present (10 years)","2002 – 2006 (4 years)")


 yr
 June 2013 – Present (2 years 9 months)
 January 2012 – June 2013 (1 year 6 months)
 2006 – Present (10 years)
 2002 – 2006 (4 years)
Run Code Online (Sandbox Code Playgroud)

我期望这样的输出。有人有建议吗?

 start_yr       end_yr

2013            2016
2012            2013
2006            2016
2002            2006
Run Code Online (Sandbox Code Playgroud)

小智 5

x <- gsub("present", "2016", yr, ignore.case = TRUE)
x <- regmatches(x, gregexpr("\\d{4}", x))
start_yr <- sapply(x, "[[", 1)
end_yr <- sapply(x, "[[", 2)
Run Code Online (Sandbox Code Playgroud)

这会将开始年份和结束年份保存在2个单独的变量中,如果您希望将它们合并在一个变量中,则只需编辑代码并使y $ start_yr y $ end_yr