我一直在努力学习一些使用java来解析文件的新技术,而对于msot部分,它一切顺利.但是,我很遗憾如何将xml文件解析为收到结构时未知的结构.如果您知道结构(getElementByTagName似乎是要走的路),如何这样做的大量示例,但没有动态选项,至少不是我发现的.
所以这个问题的tl; dr版本,如何解析一个我不能依赖于知道它的结构的xml文件?
Jas*_*n C 13
解析部分很容易; 像注释中所述的helderdarocha一样,解析器只需要有效的XML,它不关心结构.您可以使用Java的标准DocumentBuilder来获取Document:
InputStream in = new FileInputStream(...);
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
Run Code Online (Sandbox Code Playgroud)
(如果要解析多个文档,可以继续重复使用DocumentBuilder.)
然后你可以从根文档元素开始,并从那里使用熟悉的DOM方法:
Element root = doc.getDocumentElement(); // perform DOM operations starting here.
Run Code Online (Sandbox Code Playgroud)
至于处理它,它实际上取决于你想用它做什么,但你可以使用Nodelike 的方法,getFirstChild()并getNextSibling()根据结构,标签和属性迭代子项和处理.
请考虑以下示例:
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
public class XML {
public static void main (String[] args) throws Exception {
String xml = "<objects><circle color='red'/><circle color='green'/><rectangle>hello</rectangle><glumble/></objects>";
// parse
InputStream in = new ByteArrayInputStream(xml.getBytes("utf-8"));
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
// process
Node objects = doc.getDocumentElement();
for (Node object = objects.getFirstChild(); object != null; object = object.getNextSibling()) {
if (object instanceof Element) {
Element e = (Element)object;
if (e.getTagName().equalsIgnoreCase("circle")) {
String color = e.getAttribute("color");
System.out.println("It's a " + color + " circle!");
} else if (e.getTagName().equalsIgnoreCase("rectangle")) {
String text = e.getTextContent();
System.out.println("It's a rectangle that says \"" + text + "\".");
} else {
System.out.println("I don't know what a " + e.getTagName() + " is for.");
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
输入XML文档(例如硬编码)是:
<objects>
<circle color='red'/>
<circle color='green'/>
<rectangle>hello</rectangle>
<glumble/>
</objects>
Run Code Online (Sandbox Code Playgroud)
输出是:
It's a red circle! It's a green circle! It's a rectangle that says "hello". I don't know what a glumble is for.
| 归档时间: |
|
| 查看次数: |
11819 次 |
| 最近记录: |