Mei*_*eir 3 python rss timezone datetime feedparser
我需要获取 RSS 提要的已发布字段,并且我需要知道时区是什么。我以 UTC 格式存储日期,并且我想要另一个字段来存储时区,以便我以后可以操纵日期时间。
我目前的代码如下:
for entry in feed['entries']:
if hasattr(entry, 'published'):
if isinstance(entry.published_parsed, struct_time):
dt = datetime(*entry.published_parsed[:-3])
Run Code Online (Sandbox Code Playgroud)
dt 的最终值是 UTC 中的正确日期时间,但我还需要获取原始时区。任何人都可以帮忙吗?
编辑:
为了将来参考,即使它不是我最初问题的一部分,如果您需要操作非标准时区(如 est),您需要根据您的规范制作一个转换表。感谢这个答案:Parsing date/time string with timezone abbreviated name in Python?
您可以使用打包的parser.parse方法dateutil。
例如对于 statckoverflow:
import feedparser
from dateutil import parser, tz
url = 'http://stackoverflow.com/feeds/tag/python'
feed = feedparser.parse(url)
published = feed.entries[0].published
dt = parser.parse(published)
print(published)
print(dt) # that is timezone aware
print(dt.utcoffset()) # time zone of time
print(dt.astimezone(tz.tzutc())) # that is timezone aware as UTC
2012-11-28T19:07:32Z
2012-11-28 19:07:32+00:00
0:00:00
2012-11-28 19:07:32+00:00
Run Code Online (Sandbox Code Playgroud)
您可以看到以published结尾Z,这意味着时区是 UTC:
着眼于日期格式的历史它在feedparser:
Atom 1.0 states that all date elements “MUST conform to the date-time
production in RFC 3339.
In addition, an uppercase T character MUST be used to separate date and time,
and an uppercase Z character MUST be present in the absence of
a numeric time zone offset.”
Run Code Online (Sandbox Code Playgroud)
再举一个例子:
import feedparser
from dateutil import parser, tz
url = 'http://omidraha.com/rss/'
feed = feedparser.parse(url)
published = feed.entries[0].published
dt = parser.parse(published)
print(published)
print(dt) # that is timezone aware
print(dt.utcoffset()) # time zone of time
print(dt.astimezone(tz.tzutc())) # that is timezone aware as UTC
Thu, 26 Dec 2013 14:24:04 +0330
2013-12-26 14:24:04+03:30
3:30:00
2013-12-26 10:54:04+00:00
Run Code Online (Sandbox Code Playgroud)
但这也取决于接收到的时间数据的格式和提要的类型。