KXmlParser在RSS pasing开始时抛出"Unexpected token"异常

Bos*_*one 4 rss android xmlpullparser

我正在尝试使用以下URL从Android v.17上的Monster解析RSS提要:

http://rss.jobsearch.monster.com/rssquery.ashx?q=java

为了得到我正在以下列方式使用HttpUrlConnection的内容

this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(url.openStream());
Run Code Online (Sandbox Code Playgroud)

回到过去的是我所说的(我也证实了)一个合法的RSS

Cache-Control:private
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:5958
Content-Type:text/xml
Date:Wed, 06 Mar 2013 17:15:20 GMT
P3P:CP=CAO DSP COR CURa ADMa DEVa IVAo IVDo CONo HISa TELo PSAo PSDo DELa PUBi BUS LEG PHY ONL UNI PUR COM NAV INT DEM CNT STA HEA PRE GOV OTC
Server:Microsoft-IIS/7.5
Vary:Accept-Encoding
X-AspNet-Version:2.0.50727
X-Powered-By:ASP.NET
Run Code Online (Sandbox Code Playgroud)

它是这样开始的(如果要查看完整的XML,请单击上面的URL):

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Monster Job Search Results java</title>
    <description>RSS Feed for Monster Job Search</description>
    <link>http://rss.jobsearch.monster.com/rssquery.ashx?q=java</link>
Run Code Online (Sandbox Code Playgroud)

但是当我试图解析它时:

final XmlPullParser xpp = getPullParser();
xpp.setInput(is);
for (int type = xpp.getEventType(); type != XmlPullParser.END_DOCUMENT; type = xpp.next()) { /* pasing goes here */ }
Run Code Online (Sandbox Code Playgroud)

代码立即扼杀type = xpp.next()了以下异常

03-06 09:27:27.796: E/AbsXmlResultParser(13363): org.xmlpull.v1.XmlPullParserException: 
   Unexpected token (position:TEXT ?@1:2 in java.io.InputStreamReader@414b4538) 
Run Code Online (Sandbox Code Playgroud)

这实际上意味着它无法在第1行处理第二个字符 <?xml version="1.0" encoding="utf-8"?>

以下是KXmlParser.java(425-426)中的违规行.类型== TEXT的计算结果为true

if (depth == 0 && (type == ENTITY_REF || type == TEXT || type == CDSECT)) {
    throw new XmlPullParserException("Unexpected token", this, null);
}
Run Code Online (Sandbox Code Playgroud)

有帮助吗?我确实尝试将解析器设置为XmlPullParser.FEATURE_PROCESS_DOCDECL = false但没有帮助

我在网上和这里研究了这个,但找不到任何有用的东西

Vla*_*nov 34

您收到错误的原因是xml文件实际上并不是以<?xml version="1.0" encoding="utf-8"?>.它从三个特殊字节EF BB BF它们Byte order mark.

十六进制表示

InputStreamReader不会自动处理这些字节,因此您必须手动处理它们.最简单的方法是BOMInpustStreamCommons IO库中使用:

this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(new BOMInputStream(conn.getInputStream(), false, ByteOrderMark.UTF_8));  
Run Code Online (Sandbox Code Playgroud)

我检查了上面的代码,它对我很有用.

  • 或者你可以做data.replaceAll("^.*<","<")为我工作) (5认同)
  • 这正是我喜欢Stackoverflow的原因!人们总能找到一个比自己聪明的人!当之无愧的恩惠(虽然我不能比明天更早奖励)!谢谢! (2认同)