使用python从XML中提取文本

Question

使用python从XML中提取文本

我有这个示例xml文件

<page>
  <title>Chapter 1</title>
  <content>Welcome to Chapter 1</content>
</page>
<page>
 <title>Chapter 2</title>
 <content>Welcome to Chapter 2</content>
</page>

Run Code Online (Sandbox Code Playgroud)

我想提取标题标签和内容标签的内容.

使用模式匹配或使用xml模块,哪种方法可以提取数据.或者有更好的方法来提取数据.

Answer 1

San*_*nta 20

特别是已经有一个内置的XML库ElementTree.例如:

>>> from xml.etree import cElementTree as ET
>>> xmlstr = """
... <root>
... <page>
...   <title>Chapter 1</title>
...   <content>Welcome to Chapter 1</content>
... </page>
... <page>
...  <title>Chapter 2</title>
...  <content>Welcome to Chapter 2</content>
... </page>
... </root>
... """
>>> root = ET.fromstring(xmlstr)
>>> for page in list(root):
...     title = page.find('title').text
...     content = page.find('content').text
...     print('title: %s; content: %s' % (title, content))
...
title: Chapter 1; content: Welcome to Chapter 1
title: Chapter 2; content: Welcome to Chapter 2

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，3 月前
查看次数：	32689 次
最近记录：	6 年，3 月前