如何从自由格式文本中提取日期时间?

Pau*_*aul 12 javascript datetime nlp

我试图想出一些类似于Google日历(甚至是一些gmail消息)的内容,其中自由格式文本将被解析并转换为特定的日期/时间.

一些例子(为简单起见,现在是2013年1月1日凌晨1点):

"I should call Mom tomorrow to wish her a happy birthday" -> "tomorrow" = "2013-01-02"
"The super bowl is on Feb 3rd at 6:30pm" -> "Feb 3rd at 6:30" => "2013-02-03T06:30:00Z"
"Remind me to take out the trash on Friday" => "Friday" => "2013-01-04"
Run Code Online (Sandbox Code Playgroud)

首先,我会问这个问题 - 是否存在任何已经存在的开源库(或其中的一部分).如果没有,你认为我应该采取什么样的方法?

我正在考虑几种不同的可能性:

  1. 很多正则表达式,我可以为每个不同的用例提出尽可能多的正则表达式
  2. 某种类型的贝叶斯网络查看n-gram并将它们分类为不同的场景,如"相对日期","相对日期","特定日期","日期和时间",然后通过规则引擎运行(也许更多正则表达式来计算实际日期.
  3. 将其发送到Google搜索并尝试从搜索结果中提取有意义的信息(这可能不太现实)

Dog*_*ert 10

您可以使用此库:https://github.com/wanasit/chrono

演示:

inputs = ["I should call Mom tomorrow to with her a happy birthday",
"The super bowl is on Feb 3rd at 6:30pm", "Remind me to take out the trash on Friday"];

for(var i = 0; i < inputs.length; i++) {
    var input = inputs[i];
    var parsed = chrono.parse(input);
    console.log(input + " parsed as: " + JSON.stringify(parsed.map(function(p) { return [p.text, p.startDate]; })));
}
?
Run Code Online (Sandbox Code Playgroud)

输出:

I should call Mom tomorrow to with her a happy birthday parsed as: [["tomorrow","2012-12-31T06:30:00.000Z"]]
The super bowl is on Feb 3rd at 6:30pm parsed as: [["Feb 3rd at 6:30pm","2013-02-03T13:00:00.000Z"]]
Remind me to take out the trash on Friday parsed as: [["Friday","2013-01-04T06:30:00.000Z"]] 
Run Code Online (Sandbox Code Playgroud)

http://jsfiddle.net/TXX3Z/

  • 哇,这有我想要的一切!谢谢! (2认同)