使用Python Scrape XML文件

dav*_*p13 1 python xml csv parsing

我一直试图刮取XML文件来复制2个标签的内容,仅限代码和源代码.xml文件如下所示:

<Series xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RunDate>2018-06-12</RunDate>
  <Instruments>
    <Instrument>
      <Code>27BA1</Code>
      <Source>YYY</Source>
    </Instrument>
    <Instrument>
      <Code>28BA1</Code>
      <Source>XXX</Source>
    </Instrument>
      <Code>29BA1</Code>
      <Source>XXX</Source>
    </Instrument>
      <Code>30BA1</Code>
      <Source>DDD</Source>
    </Instrument>
  </Instruments>
</Series>
Run Code Online (Sandbox Code Playgroud)

我只是抓住了第一个代码.下面是代码.有人可以帮忙吗?

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("data.xml")
csv_fname = "data.csv"
root = tree.getroot()

f = open(csv_fname, 'w')
csvwriter = csv.writer(f)
count = 0
head = ['Code', 'Source']

csvwriter.writerow(head)

for time in root.findall('Instruments'):
    row = []
    job_name = time.find('Instrument').find('Code').text
    row.append(job_name)
    job_name_1 = time.find('Instrument').find('Source').text
    row.append(job_name_1)
    csvwriter.writerow(row)
f.close()
Run Code Online (Sandbox Code Playgroud)

Anu*_*ngh 5

您在帖子中提供的XML文件无效.通过在此处粘贴文件进行检查.https://www.w3schools.com/xml/xml_validator.asp

我假设的有效xml是

<Series xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RunDate>2018-06-12</RunDate>
  <Instruments>
    <Instrument>
      <Code>27BA1</Code>
      <Source>YYY</Source>
    </Instrument>
    <Instrument>
      <Code>28BA1</Code>
      <Source>XXX</Source>
    </Instrument>
    <Instrument>
      <Code>29BA1</Code>
      <Source>XXX</Source>
    </Instrument>
    <Instrument>
      <Code>30BA1</Code>
      <Source>DDD</Source>
    </Instrument>
  </Instruments>
</Series>
Run Code Online (Sandbox Code Playgroud)

在"代码"和"源"标记中打印值.

from lxml import etree
root = etree.parse('data.xml').getroot()
instruments = root.find('Instruments')
instrument = instruments.findall('Instrument')
for grandchild in instrument:
    code, source = grandchild.find('Code'), grandchild.find('Source')
    print (code.text), (source.text)
Run Code Online (Sandbox Code Playgroud)