如何使用Java将> 1000 xml文件合并为一个

And*_*dra 8 java xml performance merge out-of-memory

我试图将许多xml文件合并为一个.我已经在DOM中成功完成了这项工作,但此解决方案仅限于几个文件.当我在多个文件> 1000上运行它时,我得到一个java.lang.OutOfMemoryError.

我想要实现的是我有以下文件

档案1:

<root>
....
</root>
Run Code Online (Sandbox Code Playgroud)

档案2:

<root>
......
</root>
Run Code Online (Sandbox Code Playgroud)

档案n:

<root>
....
</root>
Run Code Online (Sandbox Code Playgroud)

结果:输出:

<rootSet>
<root>
....
</root>
<root>
....
</root>
<root>
....
</root>
</rootSet>
Run Code Online (Sandbox Code Playgroud)

这是我目前的实施:

    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
    Document doc = docBuilder.newDocument();
    Element rootSetElement = doc.createElement("rootSet");
    Node rootSetNode = doc.appendChild(rootSetElement);
    Element creationElement = doc.createElement("creationDate");
    rootSetNode.appendChild(creationElement);
    creationElement.setTextContent(dateString); 
    File dir = new File("/tmp/rootFiles");
    String[] files = dir.list();
    if (files == null) {
        System.out.println("No roots to merge!");
    } else {
        Document rootDocument;
            for (int i=0; i<files.length; i++) {
                       File filename = new File(dir+"/"+files[i]);        
               rootDocument = docBuilder.parse(filename);
               Node tempDoc = doc.importNode((Node) Document.getElementsByTagName("root").item(0), true);
               rootSetNode.appendChild(tempDoc);
        }
    }   
Run Code Online (Sandbox Code Playgroud)

我用xslt,sax进行了很多实验,但我似乎总是缺少一些东西.任何帮助将受到高度赞赏

csd*_*csd 10

您也可以考虑使用StAX.这里的代码可以做你想要的:

import java.io.File;
import java.io.FileWriter;
import java.io.Writer;

import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.events.XMLEvent;
import javax.xml.transform.stream.StreamSource;

public class XMLConcat {
    public static void main(String[] args) throws Throwable {
        File dir = new File("/tmp/rootFiles");
        File[] rootFiles = dir.listFiles();

        Writer outputWriter = new FileWriter("/tmp/mergedFile.xml");
        XMLOutputFactory xmlOutFactory = XMLOutputFactory.newFactory();
        XMLEventWriter xmlEventWriter = xmlOutFactory.createXMLEventWriter(outputWriter);
        XMLEventFactory xmlEventFactory = XMLEventFactory.newFactory();

        xmlEventWriter.add(xmlEventFactory.createStartDocument());
        xmlEventWriter.add(xmlEventFactory.createStartElement("", null, "rootSet"));

        XMLInputFactory xmlInFactory = XMLInputFactory.newFactory();
        for (File rootFile : rootFiles) {
            XMLEventReader xmlEventReader = xmlInFactory.createXMLEventReader(new StreamSource(rootFile));
            XMLEvent event = xmlEventReader.nextEvent();
            // Skip ahead in the input to the opening document element
            while (event.getEventType() != XMLEvent.START_ELEMENT) {
                event = xmlEventReader.nextEvent();
            }

            do {
                xmlEventWriter.add(event);
                event = xmlEventReader.nextEvent();
            } while (event.getEventType() != XMLEvent.END_DOCUMENT);
            xmlEventReader.close();
        }

        xmlEventWriter.add(xmlEventFactory.createEndElement("", null, "rootSet"));
        xmlEventWriter.add(xmlEventFactory.createEndDocument());

        xmlEventWriter.close();
        outputWriter.close();
    }
}
Run Code Online (Sandbox Code Playgroud)

一个小小的警告是,这个API似乎混乱了空标签,<foo/>变成了<foo></foo>.