use*_*376 2 python parsing lxml xml.etree
我有一个脚本来解析一个html文件,该脚本运行得很好,直到我对其稍作更改,从而可以从终端运行它,如下所示:
python myscript.py filename
Run Code Online (Sandbox Code Playgroud)
因此,当指示要解析的文件的直接名称时,它将起作用:
tree = etree.parse("folder/filename.html")
places = []
def f1():
for dfn in tree.getiterator('dfn'):
...
return places
def main():
f1()
file_places = open('list_places.txt', 'w')
for x in sorted(places):
print>>file_places, x
Run Code Online (Sandbox Code Playgroud)
然后,我没有指定文件的确切名称,而是指定了一个变量,然后该变量应在命令行中用作参数:
args=sys.argv[1:]
filename = sys.argv[0]
tree = etree.parse(filename)
places = []
def extract_places():
for dfn in tree.getiterator('dfn'):
...
return places
def main():
if len(args) < 1:
print 'usage: extract.py [file ...]'
sys.exit(1)
else:
extract_places()
file_places = open('list_places.txt', 'w')
for x in sorted(places):
print>>file_places, x
Run Code Online (Sandbox Code Playgroud)
这是我得到的错误:追溯(最近一次呼叫过去):
File "extract.py", line 15, in <module>
tree = etree.parse(filename)
File "lxml.etree.pyx", line 2957, in lxml.etree.parse (src/lxml/lxml.etree.c:56299)
File "parser.pxi", line 1533, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:82382)
File "parser.pxi", line 1562, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:82675)
File "parser.pxi", line 1462, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:81714)
File "parser.pxi", line 1002, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:78623)
File "parser.pxi", line 569, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:74567)
File "parser.pxi", line 650, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:75458)
File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74791)
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
Run Code Online (Sandbox Code Playgroud)
filename = sys.argv[0]
Run Code Online (Sandbox Code Playgroud)
有你的问题。我怀疑您打算这样做:
filename = args[0]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1659 次 |
| 最近记录: |