Shi*_*ith 0 python regex python-regex
我必须使用正则表达式从字符串中识别出不同的日期格式,如下所示。
date can contain 21/12/2018
or 12/21/2018
or 2018/12/21
or 12/2018
or 21-12-2018
or 12-21-2018
or 2018-12-21
or 21-Jan-2018
or Jan 21,2018
or 21st Jan 2018
or 21-Jan-2018
or Jan 21,2018
or 21st Jan 2018
or Jan 21, 2018
or Jan 21, 2018
or 2018 Dec. 21
or 2018 Dec 21
or 21st of Jan 2018
or 21st of Jan 2018
or Jan 2018
or Jan 2018
or Jan. 2018
or Jan, 2018
or 2018
[should recognize (year only), (year and month), (year, month and day), year is mandatory in every date format to be recognized]
[months are abbreviated to three letters, first letter capital]
Run Code Online (Sandbox Code Playgroud)
我的正则表达式如下
\b(((((0?[1-9]|[12][0-9]|3[01])(\s*(st|nd|rd|th)?\s*(of)?\s*)?)|(20[012]\d)|(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))[\/\-\.\,\s]*){1,3})\b
Run Code Online (Sandbox Code Playgroud)
它没有按预期运行,并且也获得了其他模式。我不得不承认3个图纹(year only),(year and month),(year, month and day),今年是强制性的每一个日期模式得到认可。
要使其正常工作需要进行哪些更正?请帮忙。
IIUC dateutil.parser将是比re以下更好的选择:
import dateutil.parser as dparser
l = ["21/12/2018","12/21/2018","2018/12/21","12/2018",
"21-12-2018","12-21-2018","2018-12-21","21-Jan-2018",
"Jan 21,2018","21st Jan 2018","21-Jan-2018","Jan 21,2018",
"21st Jan 2018","Jan 21, 2018","Jan 21, 2018","2018 Dec. 21",
"2018 Dec 21","21st of Jan 2018","21st of Jan 2018","Jan 2018",
"Jan 2018","Jan. 2018","Jan, 2018","2018"]
[str(dparser.parse(i, fuzzy=True)) for i in l]
Run Code Online (Sandbox Code Playgroud)
输出:
['2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-12-07 00:00:00',
'2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-01-21 00:00:00',
'2019-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-21 00:00:00',
'2019-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-07 00:00:00',
'2018-01-07 00:00:00',
'2018-01-07 00:00:00',
'2018-01-07 00:00:00',
'2018-08-07 00:00:00']
Run Code Online (Sandbox Code Playgroud)
dateutil.parser 还可以处理句子中是否包含类似日期的内容(尽管并非总是如此):
s = 'The new millennium has finally come and it is now 1st of Jan 2000.'
str(dparser.parse(s, fuzzy=True))
# '2000-01-01 00:00:00'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
62 次 |
| 最近记录: |