DTD与XSD定义的XML语言范围

ale*_*sch 4 xml xsd dtd formal-languages

以下命题是否成立:对于每个DTD,都有一个定义完全相同语言的XSD,并且每个XSD都有一个定义完全相同语言的DTD.或换句话说:任何DTD定义的语言集合都是任何XSD定义的语言集合?

稍微扩展一下这个问题:XML文档基本上是一个大字符串.语言是字符串的集合.例如,所有MathML文档的(无限)集合都是一种语言,所有RSS文档的集合也是如此.MathML(RSS,...)也是所有XML文档的(无限)集合的适当子集.您可以使用DTD或XSD来定义这样的XML子集.

现在,每个DTD都只定义一种语言.但是如果你想到所有可能的DTD,你会得到一套语言.我的问题是,这个设置与你从所有可能的XSD获得的设置完全相同吗?如果是这样,那么DTD和XSD在两者所定义的XML语言范围相等的意义上是等价的.

为什么这个问题很重要?如果DTD和XSD都是等效的,则可以编写一个程序,它将DTD作为输入并为您提供等效的XSD,另一个程序则执行相反的操作.我知道有很多程序声称要做到这一点,但我怀疑这是否真的有可能.

C. *_*een 5

一个有趣的问题; 好问!

两个方向的答案都是"不".

这是一个在XSD中没有等效的DTD:

<!ELEMENT e (#PCDATA | e)* >
<!ENTITY egbdf "Every good boy deserves favor.">
Run Code Online (Sandbox Code Playgroud)

此DTD接受的字符序列集包括<e/><e>&egbdf;</e>,但不包括<e>&beadgcf;</e>.

由于XSD验证对已经扩展了实体的信息集进行操作,因此没有XSD架构可以区分第三种情况和第二种情况.

DTD可以表达在XSD中不可表达的约束的第二个区域涉及NOTATION类型.我不会举一个例子; 细节太复杂了,我无法正确记住它们,而且没有足够的兴趣让我想要这么做.

第三个方面:DTD以相同的方式处理命名空间属性(也称为命名空间声明)和一般属性; 因此,DTD可以约束文档中名称空间声明的外观.XSD架构不能.这同样适用于xsi名称空间中的属性.

If we ignore all of those issues, and formulate the question with respect only to character sequences containing no references to named entities other than the pre-defined entities lt, gt, etc., then the answer changes: for every DTD not involving NOTATION declarations, there is an XSD schema that accepts precisely the same set of documents after entity expansion and with 'same' defined in a way that ignores namespace attributes and attributes in the xsi namespace.

In the other direction, the areas of difference include these:

  • XSD is namespace aware: the following XSD schema accepts any instance of element e in the specified target namespace, regardless of what prefix is bound to that namespace in the document instance.

    <xs:schema xmlns:xs="..." targetNamespace="http://example.com/nss/24397">
      <xs:element name="e" type="xs:string"/>
    </xs:schema>
    
    Run Code Online (Sandbox Code Playgroud)

    没有DTD可以成功接受e给定命名空间中的所有元素和仅元素.

  • XSD具有更丰富的数据类型集,可以使用数据类型来约束元素和属性.以下XSD架构没有等效的DTD:

    <xs:schema xmlns:xs="...">
      <xs:element name="e" type="xs:integer"/>
    </xs:schema>
    
    Run Code Online (Sandbox Code Playgroud)

    此架构接受文档<e>42</e>但不接受文档<e>42d Street</e>.没有DTD可以做出这种区分,因为DTD没有约束#PCDATA内容的机制.最接近的DTD将<!ELEMENT e (#PCDATA)>接受两个样本文档.

  • XSD的xsi:type属性允许对内容模型进行文档内修改.以下架构文档描述的XSD架构没有等效的DTD:

    <xs:schema xmlns:xs="...">
      <xs:complexType name="e">
        <xs:sequence>
          <xs:element ref="e" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
      </xs:complexType>
      <xs:complexType name="e2">
        <xs:sequence>
          <xs:element ref="e" minOccurs="2" maxOccurs="2"/>
        </xs:sequence>
      </xs:complexType>
    
      <xs:element name="e" type="e"/>
    </xs:schema>
    
    Run Code Online (Sandbox Code Playgroud)

    This schema accepts the document <e xmlns:xsi="..." xsi:type="e2"><e/><e/></e> and rejects the document <e xmlns:xsi="..." xsi:type="e2"><e/><e/><e/></e>. DTDs have no mechanism for making content models depend on an attribute value given in the document instance.

  • XSD wildcards allow the inclusion of arbitrary well-formed XML among the children of specified elements; the closest one can come to that with a DTD is to use an element declaration of the form <!ELEMENT e ANY>, which is not the same because it requires declarations for all the elements which in fact appear.

  • XSD 1.1 provides assertions and conditional type assignment, which have no analogues in DTDs.

There are probably other ways in which the expressive power of XSD exceeds that of DTDs, but I think the point has been illustrated adequately.

I think a fair summary would be: XSD can express everything DTDs can express, with the exception of entity declarations and special cases like namespace declarations and xsi:*attributes, because XSD was designed to be able to do so. So the loss of information when translating a DTD to an XSD schema document is relatively modest, well understood, and mostly involves things most vocabulary designers regard as DTD artefacts not of substantive interest.

XSD can express more than DTDs can, again because XSD was designed to do so. In the general case, translation from XSD to DTD necessarily involves loss of information (the set of documents accepted may need to be larger, or smaller, or to be an overlapping set). Different choices can be made about how to manage the loss of information, which gives the question "How does one best translate an XSD into DTD form?" a certain theoretical interest. (Very few people, however, seem to find it an interesting question in practice.)

所有这一切都集中在你的问题上,作为字符序列的文档,作为文档集的语言,以及作为语义生成器的模式语言.模式中存在的可维护性和信息问题不会导致文档集扩展中的差异(例如,文档模型中的类层次结构的处理)被排除在外.