Bos*_*one 4 rss android xmlpullparser
我正在尝试使用以下URL从Android v.17上的Monster解析RSS提要:
http://rss.jobsearch.monster.com/rssquery.ashx?q=java
为了得到我正在以下列方式使用HttpUrlConnection的内容
this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(url.openStream());
Run Code Online (Sandbox Code Playgroud)
回到过去的是我所说的(我也证实了)一个合法的RSS
Cache-Control:private
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:5958
Content-Type:text/xml
Date:Wed, 06 Mar 2013 17:15:20 GMT
P3P:CP=CAO DSP COR CURa ADMa DEVa IVAo IVDo CONo HISa TELo PSAo PSDo DELa PUBi BUS LEG PHY ONL UNI PUR COM NAV INT DEM CNT STA HEA PRE GOV OTC
Server:Microsoft-IIS/7.5
Vary:Accept-Encoding
X-AspNet-Version:2.0.50727
X-Powered-By:ASP.NET
Run Code Online (Sandbox Code Playgroud)
它是这样开始的(如果要查看完整的XML,请单击上面的URL):
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Monster Job Search Results java</title>
<description>RSS Feed for Monster Job Search</description>
<link>http://rss.jobsearch.monster.com/rssquery.ashx?q=java</link>
Run Code Online (Sandbox Code Playgroud)
但是当我试图解析它时:
final XmlPullParser xpp = getPullParser();
xpp.setInput(is);
for (int type = xpp.getEventType(); type != XmlPullParser.END_DOCUMENT; type = xpp.next()) { /* pasing goes here */ }
Run Code Online (Sandbox Code Playgroud)
代码立即扼杀type = xpp.next()了以下异常
03-06 09:27:27.796: E/AbsXmlResultParser(13363): org.xmlpull.v1.XmlPullParserException:
Unexpected token (position:TEXT ?@1:2 in java.io.InputStreamReader@414b4538)
Run Code Online (Sandbox Code Playgroud)
这实际上意味着它无法在第1行处理第二个字符 <?xml version="1.0" encoding="utf-8"?>
以下是KXmlParser.java(425-426)中的违规行.类型== TEXT的计算结果为true
if (depth == 0 && (type == ENTITY_REF || type == TEXT || type == CDSECT)) {
throw new XmlPullParserException("Unexpected token", this, null);
}
Run Code Online (Sandbox Code Playgroud)
有帮助吗?我确实尝试将解析器设置为XmlPullParser.FEATURE_PROCESS_DOCDECL = false但没有帮助
我在网上和这里研究了这个,但找不到任何有用的东西
Vla*_*nov 34
您收到错误的原因是xml文件实际上并不是以<?xml version="1.0" encoding="utf-8"?>.它从三个特殊字节EF BB BF它们Byte order mark.

InputStreamReader不会自动处理这些字节,因此您必须手动处理它们.最简单的方法是BOMInpustStream在Commons IO库中使用:
this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(new BOMInputStream(conn.getInputStream(), false, ByteOrderMark.UTF_8));
Run Code Online (Sandbox Code Playgroud)
我检查了上面的代码,它对我很有用.
| 归档时间: |
|
| 查看次数: |
9236 次 |
| 最近记录: |