phl*_*ton 6 python loops readline
2天前我第一次介绍Python(以及一般的编程).今天我被卡住了.我花了好几个小时试图找到一个答案,我怀疑这个问题是如此微不足道,其他人还没有被困在这里:)
老板要我手动将巨大的.xml文件清理成更人性化的东西.我正在尝试创建一个脚本来为我完成.以下是.xml文件的示例以及我想要的输出.
<IssueTracking>
<Issue>
<SequenceNum>123</SequenceNum>
<Subject>Subject of Ticket 123</Subject>
<Description>Line 1 in Description field of Ticket 123.
Line 2 in Description field of Ticket 123.
Line 3 in Description field of Ticket 123.</Description>
</Issue>
<Issue>
<SequenceNum>124</SequenceNum>
<Subject>Subject of Ticket 124</Subject>
<Description>Line 1 in Description field of Ticket 124.
Line 2 in Description field of Ticket 124.
Line 3 in Description field of Ticket 124.</Description>
</Issue>
</IssueTracking>
Run Code Online (Sandbox Code Playgroud)
123 Subject of Ticket 123
Line 1 in Description field of Ticket 123.
Line 2 in Description field of Ticket 123.
Line 3 in Description field of Ticket 123.
124 Subject of Ticket 124
Line 1 in Description field of Ticket 124.
Line 2 in Description field of Ticket 124.
Line 3 in Description field of Ticket 124.
Run Code Online (Sandbox Code Playgroud)
这是我到目前为止所得到的.
with open(File.xml, 'r') as SourceFile: # Opens the file
while 1: # Keep going through the file to the end
SourceFileLine = SourceFile.readline() # Saves lines of the source file
if not SourceFileLine: # Skip empty lines
break
SourceFileLine = SourceFileLine.strip() # Strips the whitespace
if "<SequenceNum>" in SourceFileLine:
SequenceNum = SourceFileLine[13:-14] # Trims the tags, saves the field.
continue
if "<Subject>" in SourceFileLine:
Subject = SourceFileLine[9:-10]
continue
#if "<Description>" in SourceFileLine:
# last_pos = SourceFile.tell()
# while "</Description>" not in SourceFileLine:
# SourceFile.seek(last_pos)
# ?????
#
# Description = Description[22:]
# continue
if "</Issue>" in SourceFileLine:
print(SequenceNum, end = "\t")
print(Subject)
# print(Description)
print("\n")
Run Code Online (Sandbox Code Playgroud)
我一直在识别并保留<Description>标签之间的这三行,我可以打印一个字符串,然后继续关闭源文件.现在已经扫描了几十个文件行读取循环的其他例子,我怀疑我需要的是标记我到达目标字段的点并在文件中的该点嵌套另一个读取循环.但我还没有找到另一个这样做的例子,所以我假设我遗漏了一些基本的东西,或者有更好的方法.在此先感谢您的帮助!
使用lxml的一个示例,我强烈建议您处理数据.(nb:为Py2.x编写,但很容易适应Py3.x)
from lxml import etree
xml = """<IssueTracking>
<Issue>
<SequenceNum>123</SequenceNum>
<Subject>Subject of Ticket 123</Subject>
<Description>Line 1 in Description field of Ticket 123.
Line 2 in Description field of Ticket 123.
Line 3 in Description field of Ticket 123.</Description>
</Issue>
<Issue>
<SequenceNum>124</SequenceNum>
<Subject>Subject of Ticket 124</Subject>
<Description>Line 1 in Description field of Ticket 124.
Line 2 in Description field of Ticket 124.
Line 3 in Description field of Ticket 124.</Description>
</Issue>
</IssueTracking>
"""
root = etree.fromstring(xml)
for issue in root.findall('Issue'):
as_list = [issue.find(n).text for n in ('SequenceNum', 'Subject', 'Description')]
as_list[2] = as_list[2].split('\n')
print as_list
Run Code Online (Sandbox Code Playgroud)
打印:
['123', 'Subject of Ticket 123', ['Line 1 in Description field of Ticket 123.', 'Line 2 in Description field of Ticket 123.', 'Line 3 in Description field of Ticket 123.']]
['124', 'Subject of Ticket 124', ['Line 1 in Description field of Ticket 124.', 'Line 2 in Description field of Ticket 124.', 'Line 3 in Description field of Ticket 124.']]
Run Code Online (Sandbox Code Playgroud)
请不要读取这样的XML文件,对于python,有各种库可以帮助读取XML文件.
查看python库,lxml它提供了一种非常简单的方法来读取然后解析XML文件,它将极大地改善您的代码.
我将解释如何使用库本身,但他们的文档远比我可以挤进这个文本区域更好:http://lxml.de/tutorial.html