L M*_*ell 7 python xml csv parsing
我正在寻找一种自动将CSV转换为XML的方法.
以下是CSV文件的示例,其中包含电影列表:
这是XML格式的文件:
<collection shelf="New Arrivals">
<movietitle="Enemy Behind">
<type>War, Thriller</type>
<format>DVD</format>
<year>2003</year>
<rating>PG</rating>
<stars>10</stars>
<description>Talk about a US-Japan war</description>
</movie>
<movietitle="Transformers">
<type>Anime, Science Fiction</type>
<format>DVD</format>
<year>1989</year>
<rating>R</rating>
<stars>8</stars>
<description>A schientific fiction</description>
</movie>
<movietitle="Trigun">
<type>Anime, Action</type>
<format>DVD</format>
<episodes>4</episodes>
<rating>PG</rating>
<stars>10</stars>
<description>Vash the Stampede!</description>
</movie>
<movietitle="Ishtar">
<type>Comedy</type>
<format>VHS</format>
<rating>PG</rating>
<stars>2</stars>
<description>Viewable boredom</description>
</movie>
</collection>
Run Code Online (Sandbox Code Playgroud)
我已经尝试了几个例子,我可以使用DOM和SAX使用Python读取csv和XML格式,但我找到了一个简单的转换示例.到目前为止,我有:
import csv
f = open('movies2.csv')
csv_f = csv.reader(f)
def convert_row(row):
return """<movietitle="%s">
<type>%s</type>
<format>%s</format>
<year>%s</year>
<rating>%s</rating>
<stars>%s</stars>
<description>%s</description>
</movie>""" % (
row.Title, row.Type, row.Format, row.Year, row.Rating, row.Stars, row.Description)
print ('\n'.join(csv_f.apply(convert_row, axis=1)))
Run Code Online (Sandbox Code Playgroud)
但我得到错误:
File "moviesxml.py", line 16, in module
print ('\n'.join(csv_f.apply(convert_row, axis=1)))
AttributeError: '_csv.reader' object has no attribute 'apply'
Run Code Online (Sandbox Code Playgroud)
我是Python的新手,所以任何帮助都会非常感激!
我使用的是Python 3.5.2.
谢谢!
丽莎
rob*_*oia 11
一种可能的解决方案是首先将csv加载到Pandas中,然后逐行将其转换为XML,如下所示:
import pandas as pd
df = pd.read_csv('untitled.txt', sep='|')
Run Code Online (Sandbox Code Playgroud)
将样本数据(假设分隔符等)加载为:
Title Type Format Year Rating Stars \
0 Enemy Behind War,Thriller DVD 2003 PG 10
1 Transformers Anime,Science Fiction DVD 1989 R 9
Description
0 Talk about...
1 A Schientific fiction
Run Code Online (Sandbox Code Playgroud)
然后使用自定义函数转换为xml:
def convert_row(row):
return """<movietitle="%s">
<type>%s</type>
<format>%s</format>
<year>%s</year>
<rating>%s</rating>
<stars>%s</stars>
<description>%s</description>
</movie>""" % (
row.Title, row.Type, row.Format, row.Year, row.Rating, row.Stars, row.Description)
print '\n'.join(df.apply(convert_row, axis=1))
Run Code Online (Sandbox Code Playgroud)
这样你就得到一个包含xml的字符串:
<movietitle="Enemy Behind">
<type>War,Thriller</type>
<format>DVD</format>
<year>2003</year>
<rating>PG</rating>
<stars>10</stars>
<description>Talk about...</description>
</movie>
<movietitle="Transformers">
<type>Anime,Science Fiction</type>
<format>DVD</format>
<year>1989</year>
<rating>R</rating>
<stars>9</stars>
<description>A Schientific fiction</description>
</movie>
Run Code Online (Sandbox Code Playgroud)
你可以转储到文件或其他什么.
灵感来自这个伟大的答案.
编辑:使用您发布的加载方法(或实际将数据加载到变量的版本):
import csv
f = open('movies2.csv')
csv_f = csv.reader(f)
data = []
for row in csv_f:
data.append(row)
f.close()
print data[1:]
Run Code Online (Sandbox Code Playgroud)
我们得到:
[['Enemy Behind', 'War', 'Thriller', 'DVD', '2003', 'PG', '10', 'Talk about...'], ['Transformers', 'Anime', 'Science Fiction', 'DVD', '1989', 'R', '9', 'A Schientific fiction']]
Run Code Online (Sandbox Code Playgroud)
我们可以通过微小的修改转换为XML:
def convert_row(row):
return """<movietitle="%s">
<type>%s</type>
<format>%s</format>
<year>%s</year>
<rating>%s</rating>
<stars>%s</stars>
<description>%s</description>
</movie>""" % (row[0], row[1], row[2], row[3], row[4], row[5], row[6])
print '\n'.join([convert_row(row) for row in data[1:]])
Run Code Online (Sandbox Code Playgroud)
得到相同的结果:
<movietitle="Enemy Behind">
<type>War</type>
<format>Thriller</format>
<year>DVD</year>
<rating>2003</rating>
<stars>PG</stars>
<description>10</description>
</movie>
<movietitle="Transformers">
<type>Anime</type>
<format>Science Fiction</format>
<year>DVD</year>
<rating>1989</rating>
<stars>R</stars>
<description>9</description>
</movie>
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
14311 次 |
最近记录: |