解析:字符串到XML

Noo*_*tor 1 python xml elementtree xml-parsing

我的API应该采用字符串并将其转换为XML格式.

但我一直得到这个错误:

ParseError:标记不匹配:第1行,第764行

XML

<?xml version="1.0" encoding="utf-8" ?>
<MasterDetails IssuerId="5" Version="12.2">
    <XMLRequest />
    <BookingDetails  Amount="768"  Comment="Hotel Travel Purchase"  CurrencyCode="INR"  PurchaseType="Hotel"  SupplierName="SomeHotel"  CardAlias="C_ALIAS"  ValidFor="-1D"  CurrencyType="B" />
    <CDFs>
        <CDF FieldName="Order Date" FieldValue="2015-01-01" />
    </CDFs>
    <SomeTag>
        <Rule Action="A" Alias="MyAlias">
            <Controls>
                <OPMCCControl Negate="False"/>
                <OPMIDControl />
                <SomeControlsTags       CumulativeLimit="768"       MaxTrans="None"                 Period="C" />
                <ValidityPeriod           ValidFrom="2015-01-01 00:00:00.0 +0000"          ValidTo="2015-01-11 00:00:00.0 +0000" />
            </Controls>
        </Rule>
    </SomeTag>
</BookingDetails>
<Email  EmailAddress="T@J.COM"/>
<MasterDetails />
Run Code Online (Sandbox Code Playgroud)

实施过:

tree = ET.ElementTree(ET.fromstring(kk.strip()))
Run Code Online (Sandbox Code Playgroud)

我肯定知道我的XML字符串包含所有匹配的标签并且已经格式化但是仍然可能在我的眼前面缺少某些东西!

ale*_*cxe 5

BookingDetails标签是自我封闭在这条线:

<BookingDetails  Amount="768"  Comment="Hotel Travel Purchase"  CurrencyCode="INR"  PurchaseType="Hotel"  SupplierName="SomeHotel"  CardAlias="C_ALIAS"  ValidFor="-1D"  CurrencyType="B" />
Run Code Online (Sandbox Code Playgroud)

但是当有一个单独的结束BookingDetails元素时:

</BookingDetails>
Run Code Online (Sandbox Code Playgroud)

此外,<MasterDetails />最后一行没有正确关闭.应该是</MasterDetails>而不是<MasterDetails />.


请注意,如果使用,您可以在"恢复"模式下解析此XML lxml.etree:

import lxml.etree as ET

parser = ET.XMLParser(recover=True)
tree = ET.ElementTree(ET.fromstring(data, parser=parser)) 
Run Code Online (Sandbox Code Playgroud)

或者,使用BeautifulSoupxml功能:

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, "xml")
print(soup.prettify())
Run Code Online (Sandbox Code Playgroud)