and*_*sel 20 html groovy xmlslurper
我正在尝试复制HTML覆盖率报告中的元素,因此覆盖总计显示在报告的顶部以及底部.
因此HTML开始,我认为格式良好:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
<link rel="stylesheet" href=".resources/report.css" type="text/css" />
<link rel="shortcut icon" href=".resources/report.gif" type="image/gif" />
<title>Unified coverage</title>
<script type="text/javascript" src=".resources/sort.js"></script>
</head>
<body onload="initialSort(['breadcrumb', 'coveragetable'])">
Run Code Online (Sandbox Code Playgroud)
Groovy的XmlSlurper抱怨如下:
doc = new XmlSlurper( /* false, false, false */ ).parse("index.html")
[Fatal Error] index.html:1:48: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
Run Code Online (Sandbox Code Playgroud)
启用DOCTYPE:
doc = new XmlSlurper(false, false, true).parse("index.html")
[Fatal Error] index.html:1:148: External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property.
External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property.
doc = new XmlSlurper(false, true, true).parse("index.html")
[Fatal Error] index.html:1:148: External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property.
External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property.
doc = new XmlSlurper(true, true, true).parse("index.html")
External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property.
doc = new XmlSlurper(true, false, true).parse("index.html")
External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property.
Run Code Online (Sandbox Code Playgroud)
所以我想我已经涵盖了所有的选择.必须有一种方法可以让这种工作不依靠正则表达式并冒着Tony The Pony的愤怒.
and*_*sel 40
啧啧.
parser=new XmlSlurper()
parser.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
parser.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
parser.parse(it)
Run Code Online (Sandbox Code Playgroud)
即使您的HTML也恰好是格式正确的XML,解析HTML的更通用解决方案是使用真正的HTML解析器。过去我曾经使用过TagSoup解析器,它可以很好地处理实际的HTML。
TagSoup提供了一个实现javax.xml.parsers.SAXParser接口的解析器,可以XmlSlurper在构造函数中提供该解析器。例:
@Grab('org.ccil.cowan.tagsoup:tagsoup:1.2.1')
import org.ccil.cowan.tagsoup.Parser
def doc = new XmlSlurper(new Parser()).parse("index.html")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
9158 次 |
| 最近记录: |