Python导入csv列表

Question

Python导入csv列表

我有一个包含大约2000条记录的CSV文件.

每条记录都有一个字符串和一个类别.

This is the first line, Line1
This is the second line, Line2
This is the third line, Line3

Run Code Online (Sandbox Code Playgroud)

我需要将此文件读入一个看起来像这样的列表;

List = [('This is the first line', 'Line1'),
        ('This is the second line', 'Line2'),
        ('This is the third line', 'Line3')]

Run Code Online (Sandbox Code Playgroud)

如何将此导入csv到我需要使用Python的列表中？

Answer 1

Mac*_*Gol 279

使用csv模块(Python 2.x):

import csv
with open('file.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print your_list
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

Run Code Online (Sandbox Code Playgroud)

如果你需要元组:

import csv
with open('test.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = map(tuple, reader)

print your_list
# [('This is the first line', ' Line1'),
#  ('This is the second line', ' Line2'),
#  ('This is the third line', ' Line3')]

Run Code Online (Sandbox Code Playgroud)

Python 3.x版本(下面是@seokhoonlee)

import csv

with open('file.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

Run Code Online (Sandbox Code Playgroud)

这在Python 3.x中不起作用:"csv.Error:迭代器应该返回字符串,而不是字节(你是否以文本模式打开文件？)"请参阅下面的Python 3.x中可用的答案 (7认同)
@DrunkenMaster,`b`导致文件以二进制模式打开而不是文本模式.在某些系统上,文本模式意味着`\n`将在读取或写入时转换为特定于平台的新行.[见文档](https://docs.python.org/2/library/functions.html#open). (5认同)
为什么使用'rb'而不是'r'？ (4认同)
为了节省几秒钟的调试时间,你应该为第一个解决方案添加一个注释,比如"Python 2.x版" (2认同)

Answer 2

seo*_*lee 50

Python3的更新:

import csv

with open('file.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

Run Code Online (Sandbox Code Playgroud)

Answer 3

Mar*_*oma 38

熊猫非常擅长处理数据.以下是如何使用它的一个示例:

import pandas as pd

# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

Run Code Online (Sandbox Code Playgroud)

一个很大的优点是pandas会自动处理标题行.

如果你还没有听说过Seaborn,我建议看一下.

另请参阅:如何使用Python读取和写入CSV文件？

熊猫#2

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
dicts = df.to_dict('records')

Run Code Online (Sandbox Code Playgroud)

df的内容是:

     country   population population_time    EUR
0    Germany   82521653.0      2016-12-01   True
1     France   66991000.0      2017-01-01   True
2  Indonesia  255461700.0      2017-01-01  False
3    Ireland    4761865.0             NaT   True
4      Spain   46549045.0      2017-06-01   True
5    Vatican          NaN             NaT   True

Run Code Online (Sandbox Code Playgroud)

dicts的内容是

[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
 {'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
 {'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
 {'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
 {'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
 {'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]

Run Code Online (Sandbox Code Playgroud)

熊猫#3

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
tuples = [[row[col] for col in df.columns] for row in df.to_dict('records')]

Run Code Online (Sandbox Code Playgroud)

内容tuples是:

[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
 ['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
 ['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
 ['Ireland', 4761865.0, NaT, True],
 ['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
 ['Vatican', nan, NaT, True]]

Run Code Online (Sandbox Code Playgroud)

Answer 4

Alg*_*bra 6

Python3更新：

import csv
from pprint import pprint

with open('text.csv', newline='') as file:
reader = csv.reader(file)
l = list(map(tuple, reader))
pprint(l)
[('This is the first line', ' Line1'),
('This is the second line', ' Line2'),
('This is the third line', ' Line3')]

Run Code Online (Sandbox Code Playgroud)

如果csvfile是文件对象，则应使用打开newline=''。
CSV模组

Answer 5

Miq*_*uel 5

如果你相信有您的输入没有逗号,以外的其他类别分开,你可以逐行读取文件中的行和分裂上,,然后推结果List

这就是说,它看起来像你正在寻找一个CSV文件,所以你可能会考虑使用该模块为它

Answer 6

小智 5

result = []
for line in text.splitlines():
    result.append(tuple(line.split(",")))

Run Code Online (Sandbox Code Playgroud)

我知道 Barranka 的评论已经有一年多了，但是对于任何偶然发现这一点并且无法弄清楚的人：_for line in text.splitlines():_ 将每个单独的行放入临时变量“line”中。_line.split(",")_ 创建一个以逗号分割的字符串列表。_tuple(~)_ 将该列表放入元组中，然后 _append(~)_ 将其添加到结果中。循环之后，_result_ 是一个元组列表，每个元组一行，每个元组元素是 csv 文件中的一个元素。 (4认同)
您能为这篇文章添加一些解释吗？仅代码（有时）很好，但代码和解释（大多数时候）更好 (2认同)

Answer 7

Kid*_*ddo 5

您可以使用该list()函数将 csv 读取器对象转换为列表

import csv

with open('input.csv', newline='') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    rows = list(reader)
    print(rows)

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，3 月前
查看次数：	421171 次
最近记录：	6 年，1 月前