Rya*_*rio 35 python xml xpath elementtree
我的XML文件如下所示:
<?xml version="1.0"?>
<ItemSearchResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2008-08-19">
<Items>
<Item>
<ItemAttributes>
<ListPrice>
<Amount>2260</Amount>
</ListPrice>
</ItemAttributes>
<Offers>
<Offer>
<OfferListing>
<Price>
<Amount>1853</Amount>
</Price>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
</ItemSearchResponse>
Run Code Online (Sandbox Code Playgroud)
我想做的就是提取ListPrice.
这是我正在使用的代码:
>> from elementtree import ElementTree as ET
>> fp = open("output.xml","r")
>> element = ET.parse(fp).getroot()
>> e = element.findall('ItemSearchResponse/Items/Item/ItemAttributes/ListPrice/Amount')
>> for i in e:
>> print i.text
>>
>> e
>>
Run Code Online (Sandbox Code Playgroud)
绝对没有输出.我也试过了
>> e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
Run Code Online (Sandbox Code Playgroud)
没有不同.
我究竟做错了什么?
Bri*_*ndy 59
你有两个问题.
1)element只包含根元素,而不是递归地包含整个文档.它的类型为Element而不是ElementTree.
2)如果将命名空间保留在XML中,则搜索字符串需要使用命名空间.
解决问题#1:
你需要改变:
element = ET.parse(fp).getroot()
Run Code Online (Sandbox Code Playgroud)
至:
element = ET.parse(fp)
Run Code Online (Sandbox Code Playgroud)
解决问题#2:
您可以从XML文档中取出xmlns,使其如下所示:
<?xml version="1.0"?>
<ItemSearchResponse>
<Items>
<Item>
<ItemAttributes>
<ListPrice>
<Amount>2260</Amount>
</ListPrice>
</ItemAttributes>
<Offers>
<Offer>
<OfferListing>
<Price>
<Amount>1853</Amount>
</Price>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
</ItemSearchResponse>
Run Code Online (Sandbox Code Playgroud)
使用此文档,您可以使用以下搜索字符串:
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
Run Code Online (Sandbox Code Playgroud)
完整代码:
from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
for i in e:
print i.text
Run Code Online (Sandbox Code Playgroud)
对问题#2的替代解决方法:
否则,您需要在srearch字符串中为每个元素指定xmlns.
完整代码:
from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)
namespace = "{http://webservices.amazon.com/AWSECommerceService/2008-08-19}"
e = element.findall('{0}Items/{0}Item/{0}ItemAttributes/{0}ListPrice/{0}Amount'.format(namespace))
for i in e:
print i.text
Run Code Online (Sandbox Code Playgroud)
两个印刷品:
2260
from xml.etree import ElementTree as ET
tree = ET.parse("output.xml")
namespace = tree.getroot().tag[1:].split("}")[0]
amount = tree.find(".//{%s}Amount" % namespace).text
Run Code Online (Sandbox Code Playgroud)
另外,请考虑使用lxml.它的速度更快.
from lxml import ElementTree as ET
Run Code Online (Sandbox Code Playgroud)
元素树使用名称空间,因此xml中的所有元素都具有{ http://weweservices.amazon.com/AWSECommerceService/2008-08-19 }之类的名称
因此,使搜索包括命名空间,例如
search = '{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Item/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ItemAttributes/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ListPrice/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount'
element.findall( search )
Run Code Online (Sandbox Code Playgroud)
给出对应于2260的元素
小智 6
我最终从原始xml中剥离出xmlns,如下所示:
def strip_ns(xml_string):
return re.sub('xmlns="[^"]+"', '', xml_string)
Run Code Online (Sandbox Code Playgroud)
显然对此非常小心,但它对我来说效果很好.