这是对前一个问题的详细说明,但随着我深入研究python,我对python如何处理csv文件感到困惑.
我有一个csv文件,它必须保持这种方式(例如,不能将其转换为文本文件).它相当于5行×11列的数组或矩阵或向量.
我一直在尝试使用我在这里和其他地方(例如python.org)找到的各种方法读取csv ,以便保留列和行之间的关系,其中第一行和第一列=非数值.其余的是浮点值,包含正浮点数和负浮点数的混合.
我想要做的是导入csv并在python中编译它,这样如果我要引用列标题,它将返回存储在行中的关联值.例如:
>>> workers, constant, age
>>> workers
w0
w1
w2
w3
constant
7.334
5.235
3.225
0
age
-1.406
-4.936
-1.478
0
Run Code Online (Sandbox Code Playgroud)
等等...
我正在寻找处理这种数据结构的技术.我是python的新手.
Kat*_*iel 120
对于Python 2
import csv
with open( <path-to-file>, "rb" ) as theFile:
reader = csv.DictReader( theFile )
for line in reader:
# line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
# e.g. print( line[ 'workers' ] ) yields 'w0'
Run Code Online (Sandbox Code Playgroud)
Python有一个强大的内置CSV处理程序.实际上,大多数内容已经内置到标准库中.
对于Python 3
删除rb参数并使用r或不传递参数(default read mode).
with open( <path-to-file>, 'r' ) as theFile:
reader = csv.DictReader(theFile)
for line in reader:
# line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
# e.g. print( line[ 'workers' ] ) yields 'w0'
print(line)
Run Code Online (Sandbox Code Playgroud)
Joh*_*hin 89
Python的csv模块按行处理数据,这是查看此类数据的常用方法.您似乎想要一种列式方法.这是一种做法.
假设您的文件已命名myclone.csv并包含
workers,constant,age
w0,7.334,-1.406
w1,5.235,-4.936
w2,3.2225,-1.478
w3,0,0
Run Code Online (Sandbox Code Playgroud)
这段代码应该给你一个或两个想法:
>>> import csv
>>> f = open('myclone.csv', 'rb')
>>> reader = csv.reader(f)
>>> headers = next(reader, None)
>>> headers
['workers', 'constant', 'age']
>>> column = {}
>>> for h in headers:
... column[h] = []
...
>>> column
{'workers': [], 'constant': [], 'age': []}
>>> for row in reader:
... for h, v in zip(headers, row):
... column[h].append(v)
...
>>> column
{'workers': ['w0', 'w1', 'w2', 'w3'], 'constant': ['7.334', '5.235', '3.2225', '0'], 'age': ['-1.406', '-4.936', '-1.478', '0']}
>>> column['workers']
['w0', 'w1', 'w2', 'w3']
>>> column['constant']
['7.334', '5.235', '3.2225', '0']
>>> column['age']
['-1.406', '-4.936', '-1.478', '0']
>>>
Run Code Online (Sandbox Code Playgroud)
要将数值移入浮点数,请添加此值
converters = [str.strip] + [float] * (len(headers) - 1)
Run Code Online (Sandbox Code Playgroud)
在前面,并做到这一点
for h, v, conv in zip(headers, row, converters):
column[h].append(conv(v))
Run Code Online (Sandbox Code Playgroud)
对于每一行而不是上面类似的两行.
Ank*_*kur 11
您可以使用pandas库并引用行和列,如下所示:
import pandas as pd
input = pd.read_csv("path_to_file");
#for accessing ith row:
input.iloc[i]
#for accessing column named X
input.X
#for accessing ith row and column named X
input.iloc[i].X
Run Code Online (Sandbox Code Playgroud)
我最近不得不为相当大的数据文件编写这个方法,我发现使用列表理解效果很好
import csv
with open("file.csv",'r') as f:
reader = csv.reader(f)
headers = next(reader)
data = [{h:x for (h,x) in zip(headers,row)} for row in reader]
#data now contains a list of the rows, with each row containing a dictionary
# in the shape {header: value}. If a row terminates early (e.g. there are 12 columns,
# it only has 11 values) the dictionary will not contain a header value for that row.
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
185098 次 |
| 最近记录: |