如何从JDOM获取节点内容

jep*_*rro 6 java xml jdom xml-parsing

我正在使用import org.jdom在java中编写应用程序.*;

我的XML有效,但有时它包含HTML标记.例如,像这样:

  <program-title>Anatomy &amp; Physiology</program-title>
  <overview>
       <content>
              For more info click <a href="page.html">here</a>
              <p>Learn more about the human body.  Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p>
       </content>
  </overview>
  <key-information>
     <category>Health &amp; Human Services</category>
Run Code Online (Sandbox Code Playgroud)

所以我的问题是overview.content节点内的<p>标签.

我希望这段代码可行:

        Element overview = sds.getChild("overview");
        Element content = overview.getChild("content");

        System.out.println(content.getText());
Run Code Online (Sandbox Code Playgroud)

但它返回空白.

如何从overview.content节点返回所有文本(嵌套标签和所有文本)?

谢谢

Pra*_*ate 16

content.getText() 提供即时文本,该文本仅对具有文本内容的叶元素有用.

诀窍就是使用org.jdom.output.XMLOutputter(带文字模式CompactFormat)

public static void main(String[] args) throws Exception {
    SAXBuilder builder = new SAXBuilder();
    String xmlFileName = "a.xml";
    Document doc = builder.build(xmlFileName);

    Element root = doc.getRootElement();
    Element overview = root.getChild("overview");
    Element content = overview.getChild("content");

    XMLOutputter outp = new XMLOutputter();

    outp.setFormat(Format.getCompactFormat());
    //outp.setFormat(Format.getRawFormat());
    //outp.setFormat(Format.getPrettyFormat());
    //outp.getFormat().setTextMode(Format.TextMode.PRESERVE);

    StringWriter sw = new StringWriter();
    outp.output(content.getContent(), sw);
    StringBuffer sb = sw.getBuffer();
    System.out.println(sb.toString());
}
Run Code Online (Sandbox Code Playgroud)

产量

For more info click<a href="page.html">here</a><p>Learn more about the human body. Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p>
Run Code Online (Sandbox Code Playgroud)

请探索其他格式选项并根据需要修改上述代码.

"用于封装XMLOutputter格式选项的类.典型用户可以使用getRawFormat()获取的标准格式配置(无空格更改),getPrettyFormat()(空白美化)和getCompactFormat()(空格规范化)."


duf*_*ymo 0

问题是该<content>节点没有文本子节点;它有一个<p>恰好包含文本的子项。

尝试这个:

Element overview = sds.getChild("overview");
Element content = overview.getChild("content");
Element p = content.getChild("p");
System.out.println(p.getText());
Run Code Online (Sandbox Code Playgroud)

如果您想要所有直接子节点,请调用p.getChildren()。如果您想获取所有子节点,则必须递归调用它。