如何在<p>标签之间提取文本

Question

如何在<p>标签之间提取文本

我想提取的HTML页面(S),其放置在文本p和li标签,这样我就可以开始来标记页面构造为每个页面反向索引(ES)为了回答搜索查询.

我如何p使用jsoup 获取标签

Elements e = doc.select("");

Run Code Online (Sandbox Code Playgroud)

可能是该参数中要写的字符串是什么？

Answer 1

MaV*_*SCy 20

这可以做到这一点

Elements e=doc.select("p");

Run Code Online (Sandbox Code Playgroud)

以下是您可以使用的所有选择器的列表.

假设你有这个html:

String html="<p>some <strong>bold</strong> text</p>";

Run Code Online (Sandbox Code Playgroud)

为了得到some bold text结果你应该使用:

Document doc = Jsoup.parse(html);
Element p= doc.select("p").first();
String text = doc.body().text(); //some bold text

Run Code Online (Sandbox Code Playgroud)

要么

String text = p.text(); //some bold text

Run Code Online (Sandbox Code Playgroud)

假设您现在拥有以下复杂的html

String html="<div id=someid><p>some text</p><span>some other text</span><p> another p tag</p></div>"

Run Code Online (Sandbox Code Playgroud)

要从两个p标签中获取值,您必须执行以下操作

Document doc = Jsoup.parse(html);
Element content = doc.getElementById("someid");
Elements p= content.getElementsByTag("p");

String pConcatenated="";
for (Element x: p) {
  pConcatenated+= x.text();
}

System.out.println(pConcatenated);//sometext another p tag

Run Code Online (Sandbox Code Playgroud)

你可以找到更多的信息在这里也

希望这有帮助

归档时间：	12 年，5 月前
查看次数：	26983 次
最近记录：	7 年，6 月前