我正在使用 jsoup 解析 html 并想提取 body 标签内的 innerHtml
到目前为止,我尝试使用 document.body.childern().outerHtml; 但它只给出 html 元素并跳过 body 内的浮动文本(不包含在任何 html 标签中)
private String getBodyTag(final Document document) {
return document.body().children().outerHtml();
}
Run Code Online (Sandbox Code Playgroud)
输入:
<!DOCTYPE html>
<html lang="de">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" type="text/css" href="assets/style.css">
</head>
<body>
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
some sample raw/floating text
</body>
</html>
Run Code Online (Sandbox Code Playgroud)
预期的:
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
some sample raw/floating text
Run Code Online (Sandbox Code Playgroud)
实际的:
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
Run Code Online (Sandbox Code Playgroud)
小智 5
请使用这个:
private String getBodyTag(final Document document) {
return document.body().html();
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
573 次 |
| 最近记录: |