FaC*_*fee 12 python xml tree elementtree xml-parsing
我是xml解析的新手.此xml文件具有以下树:
FHRSEstablishment
|--> Header
| |--> ...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
Run Code Online (Sandbox Code Playgroud)
但是当我使用ElementTree访问它并查找child标签和属性时,
import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
print child.tag, child.attrib
Run Code Online (Sandbox Code Playgroud)
我只得到:
Header {}
EstablishmentCollection {}
Run Code Online (Sandbox Code Playgroud)
我认为这意味着他们的属性是空的.为什么会这样,我怎么能访问内部嵌套孩子EstablishmentDetail和Scores?
编辑
感谢下面的答案,我可以进入树内,但如果我想要检索诸如此类的值,则会Scores失败:
for node in root.find('.//EstablishmentDetail/Scores'):
rating = node.attrib.get('Hygiene')
print rating
Run Code Online (Sandbox Code Playgroud)
并生产
None
None
None
Run Code Online (Sandbox Code Playgroud)
这是为什么?
Kee*_*ran 13
哟必须iter()超过你的根!
那就是root.iter()诀窍!
import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
print child.tag, child.attrib
Run Code Online (Sandbox Code Playgroud)
输出:
FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
Run Code Online (Sandbox Code Playgroud)
EstablishmentDetail您需要找到该标签然后循环其子项!那是,
for child in root.find('.//EstablishmentDetail'):
print child.tag, child.attrib
Run Code Online (Sandbox Code Playgroud)
输出:
FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
Run Code Online (Sandbox Code Playgroud)
Hygiene你在评论中提到的分数,你所做的是,它会得到第一个Scores标签,当你打电话时,它会有卫生,信心管理,结构标签for each in root.find('.//Scores'):rating=child.get('Hygiene').也就是说,显然所有三个孩子都没有这个元素!
你需要先 - 找到所有Scores标签.- 找到Hygiene每个标签!
for each in root.findall('.//Scores'):
rating = each.find('.//Hygiene')
print '' if rating is None else rating.text
Run Code Online (Sandbox Code Playgroud)
输出:
5
5
5
0
5
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
15954 次 |
| 最近记录: |