vjy*_*vjy 3 java android html-parsing jsoup
如何使用Jsoup从html元素中删除所有内联样式和其他属性(class,onclick)?
样本输入:
<div style="padding-top:25px;" onclick="javascript:alert('hi');">
This is a sample div <span class='sampleclass'> This is a sample span </span>
</div>
Run Code Online (Sandbox Code Playgroud)
样本输出:
<div>This is a sample div <span> This is a sample span </span> </div>
Run Code Online (Sandbox Code Playgroud)
我的代码(这是正确的方法还是其他更好的方法?)
Document doc = Jsoup.parse(html);
Elements el = doc.getAllElements();
for (Element e : el) {
Attributes at = e.attributes();
for (Attribute a : at) {
e.removeAttr(a.getKey());
}
}
Run Code Online (Sandbox Code Playgroud)
ash*_*tte 10
是的,一种方法确实是遍历元素并调用 removeAttr();
使用jsoup的另一种方法是使用Whitelist类(请参阅docs),该类可以与Jsoup.clean()函数一起使用,以从文档中删除任何未指定的标记或属性.
例如:
String html = "<html><head></head><body><div style='padding-top:25px;' onclick='javascript.alert('hi');'>This is a sample div <span class='sampleclass'>This is a simple span</span></div></body></html>";
Whitelist wl = Whitelist.simpleText();
wl.addTags("div", "span"); // add additional tags here as necessary
String clean = Jsoup.clean(html, wl);
System.out.println(clean);
Run Code Online (Sandbox Code Playgroud)
将导致以下输出:
11-05 19:56:39.302: I/System.out(414): <div>
11-05 19:56:39.302: I/System.out(414): This is a sample div
11-05 19:56:39.302: I/System.out(414): <span>This is a simple span</span>
11-05 19:56:39.302: I/System.out(414): </div>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4939 次 |
| 最近记录: |