我想要做的事情对我来说似乎很简单,但我的挣扎远比我应该的要多。我有一个包含以下内容的文档:
<h2>First Heading</h2>
<table>
<div class="title">First Subheading One</div>
<div class="title">First Subheading Two</div>
<div class="title">First Subheading Three</div>
</table>
<h2>Second Heading</h2>
<table>
<div class="title">Second Subheading One</div>
<div class="title">Second Subheading Two</div>
<div class="title">Second Subheading Three</div>
</table>
<h2>Third Heading</h2>
<table>
<div class="title">Third Subheading One</div>
<div class="title">Third Subheading Two</div>
<div class="title">Third Subheading Three</div>
</table>
Run Code Online (Sandbox Code Playgroud)
正如预期的那样,使用 doc.select("h2") 给了我所有的标题。使用 doc.select("div.title") 给了我所有的副标题,正如预期的那样。我想要做的是遍历返回的 h2 元素,然后在其中遍历返回的 div.title 元素 - 我已经尝试了很多东西,而且我根本不熟悉编码(jsoup 的新手) ,但是)但我似乎无法理解如何做到这一点。
Headings = httpDoc.select("h3");
for(Element Headings : heading) {
// something with heading.nextSibling here
}
Run Code Online (Sandbox Code Playgroud)
是否应该有什么我可以做的事情(例如 nextSibling)给我节点?从那里我可以做另一个 select("div.title") 并遍历那些以获取副标题?
还是我完全以错误的方式解决这个问题?抱歉,如果这看起来很愚蠢 - 感觉有点愚蠢,因为我编码的时间比我愿意承认的要多,但从来没有处理过 DOM(一直是一个 Win32 人。)
我从您的问题中了解到的是,您正在尝试获取h2标签,然后对于每个标签,heading <h2>您都在尝试获取div.title表格内的相应标签。
h3而不是h2。<table>应该有一个<tr>& <td>(我认为<td>是可选的,请查看 W3 页面)。因此,当您解析 HTML 片段时,jSoup只是格式错误prunes/removes的<table>The header is: First Heading
The div content is: First Subheading One
The div content is: First Subheading Two
The div content is: First Subheading Three
========== +_+ ===========
The header is: Second Heading
The div content is: Second Subheading One
The div content is: Second Subheading Two
The div content is: Second Subheading Three
========== +_+ ===========
The header is: Third Heading
The div content is: Third Subheading One
The div content is: Third Subheading Two
The div content is: Third Subheading Three
========== +_+ ===========
Run Code Online (Sandbox Code Playgroud)
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JSoupTest
{
public static void main(String[] args)
{
String s = "<h2>First Heading</h2>";
s += "<table><tr><td>";
s += "<div class='title'>First Subheading One</div>";
s += "<div class='title'>First Subheading Two</div>";
s += "<div class='title'>First Subheading Three</div>";
s += "</table>";
s += "<h2>Second Heading</h2>";
s += "<table><tr><td>";
s += "<div class='title'>Second Subheading One</div>";
s += "<div class='title'>Second Subheading Two</div>";
s += "<div class='title'>Second Subheading Three</div>";
s += "</td></tr></table>";
s += "<h2>Third Heading</h2>";
s += "<table><tr><td>";
s += "<div class='title'>Third Subheading One</div>";
s += "<div class='title'>Third Subheading Two</div>";
s += "<div class='title'>Third Subheading Three</div>";
s += "</td></tr></table>";
Document doc = Jsoup.parse(s);
Elements h_2 = doc.select("h2");
for(int i=0; i<h_2.size(); i++)
{
Element e = h_2.get(i);
System.out.println("The header is: " + e.ownText());
Element nextSib = e.nextElementSibling();
Elements divs = nextSib.select("div.title");
for(int j=0; j<divs.size(); j++)
{
Element d = divs.get(j);
System.out.println("The div content is: " + d.ownText());
}
System.out.println("========== +_+ ===========");
}
}
}
Run Code Online (Sandbox Code Playgroud)