我正在尝试使用SyndicationFeed对象解析Rss2,Atom提要.但是我在解析DateTime字段时会得到XmlExceptions,就像pubDate一样
2012-01-17 08:01:06
public static List<SyndicationItem> getRssData(string url)
{
List<SyndicationItem> list = new List<SyndicationItem>();
WebClient client = new WebClient();
try
{
SyndicationFeed feed = SyndicationFeed.Load(XmlReader.Create(url));
list = (from item in feed.Items select item).ToList();
}
catch (Exception e)
{
throw e;
}
return list;
}
Run Code Online (Sandbox Code Playgroud)
网址链接http://news.163.com/special/00011K6L/rss_newstop.xml
<item id="2">
<title>...</title>
<link>...</link>
<description>......</description>
<pubDate>2012-01-17 12:09:29</pubDate><-----Exception
</item>
Run Code Online (Sandbox Code Playgroud)
有没有更好的方法来实现这一目标?请帮忙.谢谢.
Mic*_*aga 15
有一个解决方法RSS20FeedFormatter抛出异常试图读取一些DateTime格式.
要解决此问题,请创建一个可识别不同日期格式的自定义XML阅读器.以下是自定义XML阅读器的示例:
XmlReader r = new MyXmlReader(url);
SyndicationFeed feed = SyndicationFeed.Load(r);
Rss20FeedFormatter rssFormatter = feed.GetRss20Formatter();
XmlTextWriter rssWriter = new XmlTextWriter("rss.xml", Encoding.UTF8);
rssWriter.Formatting = Formatting.Indented;
rssFormatter.WriteTo(rssWriter);
rssWriter.Close();
Run Code Online (Sandbox Code Playgroud)
..和以前代码中使用的类:
class MyXmlReader : XmlTextReader
{
private bool readingDate = false;
const string CustomUtcDateTimeFormat = "ddd MMM dd HH:mm:ss Z yyyy"; // Wed Oct 07 08:00:07 GMT 2009
public MyXmlReader(Stream s) : base(s) { }
public MyXmlReader(string inputUri) : base(inputUri) { }
public override void ReadStartElement()
{
if (string.Equals(base.NamespaceURI, string.Empty, StringComparison.InvariantCultureIgnoreCase) &&
(string.Equals(base.LocalName, "lastBuildDate", StringComparison.InvariantCultureIgnoreCase) ||
string.Equals(base.LocalName, "pubDate", StringComparison.InvariantCultureIgnoreCase)))
{
readingDate = true;
}
base.ReadStartElement();
}
public override void ReadEndElement()
{
if (readingDate)
{
readingDate = false;
}
base.ReadEndElement();
}
public override string ReadString()
{
if (readingDate)
{
string dateString = base.ReadString();
DateTime dt;
if(!DateTime.TryParse(dateString,out dt))
dt = DateTime.ParseExact(dateString, CustomUtcDateTimeFormat, CultureInfo.InvariantCulture);
return dt.ToUniversalTime().ToString("R", CultureInfo.InvariantCulture);
}
else
{
return base.ReadString();
}
}
}
Run Code Online (Sandbox Code Playgroud)
基本上,该 RSS 提要是无效的。如果您查看RSS 2.0 规范,它指出:
RSS 中的所有日期时间均符合 RFC 822 的日期和时间规范,但年份可以用两个字符或四个字符(首选四个)表示。
字符串“2012-01-17 12:09:29”不符合RFC 822 的“日期和时间”部分。它应该是“17 01 2012 12:09:29”或类似的内容。