使用DOM解析xml,DOCTYPE将被删除

Kit*_*Kat 14 java xml doctype dom

在编辑xml时,如何使用java擦除doctype?

得到这个xml文件:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE map[ <!ELEMENT map (station*) >
                <!ATTLIST station  id   ID    #REQUIRED> ]>
<favoris>
<station id="5">test1</station>
<station id="6">test1</station>
<station id="8">test1</station>
</favoris> 
Run Code Online (Sandbox Code Playgroud)

我的功能很基础:

public static void EditStationName(int id, InputStream is, String path, String name) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    DocumentBuilder builder = factory.newDocumentBuilder();
    Document dom = builder.parse(is);

    Element e = dom. getElementById(String.valueOf(id));
    e.setTextContent(name);
    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    FileOutputStream fos = new FileOutputStream(path);
    Result result = new StreamResult(fos);  
    Source source = new DOMSource(dom);


        xformer.setOutputProperty(
                OutputKeys.STANDALONE,"yes"     
                );

    xformer.transform(source, result);
}
Run Code Online (Sandbox Code Playgroud)

它正在工作,但doctype被删除了!我只得到整个文档,但没有doctype部分,这对我很重要,因为它允许我通过id检索!我们怎样才能保留doctype?它为什么要抹掉它?我尝试了很多解决方案,例如outputkeys或omImpl.createDocumentType,但这些解决方案都没有...

谢谢 !

Grz*_*ski 11

您的输入XML无效.那应该是:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [
    <!ELEMENT favoris (station)+>
    <!ELEMENT station (#PCDATA)>
    <!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">test1</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>
Run Code Online (Sandbox Code Playgroud)

正如@DevNull所写的完全有效,你不能写<station id="5">test1</station>(但对于Java,它甚至可以解决这个问题).


DOCTYPE 在输出XML文档中被删除:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>
Run Code Online (Sandbox Code Playgroud)

我还没有找到丢失DTD的解决方案,但作为解决方法,您可以设置外部DTD:

xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "favoris.dtd");
Run Code Online (Sandbox Code Playgroud)

结果(示例)文档:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris SYSTEM "favoris.dtd">
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>
Run Code Online (Sandbox Code Playgroud)

编辑:

我不认为可以使用Transformer类保存内联DTD (在这里视频).如果你不能使用外部DTD引用,那么你可以LSSerializer改为使用DOM Level 3 类:

DOMImplementationLS domImplementationLS =
    (DOMImplementationLS) dom.getImplementation().getFeature("LS","3.0");
LSOutput lsOutput = domImplementationLS.createLSOutput();
FileOutputStream outputStream = new FileOutputStream("output.xml");
lsOutput.setByteStream((OutputStream) outputStream);
LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
lsSerializer.write(dom, lsOutput);
outputStream.close();
Run Code Online (Sandbox Code Playgroud)

想要DTD的输出(我看不到任何standalone="yes"使用LSSerializer... 添加的选项):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris> 
Run Code Online (Sandbox Code Playgroud)

另一种方法是使用Apache Xerces2-J XMLSerializer类:

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
...

XMLSerializer serializer = new XMLSerializer();
serializer.setOutputCharStream(new java.io.FileWriter("output.xml"));
OutputFormat format = new OutputFormat();
format.setStandalone(true);
serializer.setOutputFormat(format);
serializer.serialize(dom);
Run Code Online (Sandbox Code Playgroud)

结果:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>
Run Code Online (Sandbox Code Playgroud)


jas*_*sso 8

(这种反应只是对@Grzegorz Szpetkowski的答案的补充,为什么它有效)

您丢失了doctype定义,因为您使用了Transform生成XSL转换的类.DOCTYPEXSLT树模型中没有声明或docytype定义对象/节点.当解析器将文档移交给XSLT处理器时,doctype信息将丢失,因此无法保留或复制.XSLT提供对输出树序列化的一些控制,包括添加<!DOCTYPE ... >带有公共或系统标识符的声明.这些标识符的值需要事先知道,不能从输入树中读取.也不支持创建或保留嵌入式DTD或实体声明(尽管此障碍的一种解决方法是将其作为文本输出disable-output-escaping="yes").

为了保留DTD,您需要使用XML序列化器而不是XSL转换来输出文档,就像Grzegorz已经建议的那样.