我试图解析一个HTML文档与声明的doctype使用过渡dtd,如下所示:
<!DOCTYPE html PUBLIC" - // W3C // DTD XHTML 1.0 Transitional // EN"" http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd ">
当我在文档上执行Builder.build时,我得到以下异常:
java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1305)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at nu.xom.Builder.build(Builder.java:1127)
at nu.xom.Builder.build(Builder.java:1019)
Run Code Online (Sandbox Code Playgroud)
如果我删除了doc类型声明,它解析就好了.我可以从我的浏览器成功下载dtd,它告诉我url是有效的.我不想删除doc类型声明.有没有办法告诉建设者不要下载dtd或提供备用dtd?
这解决了这个问题:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Document document = factory.newDocumentBuilder().parse(is);
Run Code Online (Sandbox Code Playgroud)
快速浏览一下Builder的 javadoc ,我想您可以通过采用XMLReader的构造函数提供EntityResolver。我会尽可能避免让解析器从互联网下载文件。
| 归档时间: |
|
| 查看次数: |
5296 次 |
| 最近记录: |