不了解Python的csv.reader对象

Question

不了解Python的csv.reader对象

Aus*_*n A 4 python memory csv pointers object

我在python的内置csv模块中遇到过一个我以前从未注意过的行为.通常,当我在csv中读取时,它几乎逐字地遵循文档,使用"with"打开文件,然后使用"for"循环遍历reader对象.但是,我最近尝试连续两次迭代csv.reader对象,结果发现第二个'for'循环没有做任何事情.

import csv

with open('smallfriends.csv','rU') as csvfile:
readit = csv.reader(csvfile,delimiter=',')

for line in readit:
    print line

for line in readit:
    print 'foo'

Run Code Online (Sandbox Code Playgroud)

控制台输出:

Austins-iMac:Desktop austin$ python -i amy.py 
['Amy', 'James', 'Nathan', 'Sara', 'Kayley', 'Alexis']
['James', 'Nathan', 'Tristan', 'Miles', 'Amy', 'Dave']
['Nathan', 'Amy', 'James', 'Tristan', 'Will', 'Zoey']
['Kayley', 'Amy', 'Alexis', 'Mikey', 'Sara', 'Baxter']
>>>
>>> readit
<_csv.reader object at 0x1023fa3d0>
>>>

Run Code Online (Sandbox Code Playgroud)

所以第二个'for'循环基本上什么也没做.我有一个想法是csv.reader对象在被读取一次后从内存中释放.但事实并非如此,因为它仍然保留了它的内存地址.我找到了一篇提到类似问题的帖子.他们给出的原因是,一旦读取了对象,指针就会停留在内存地址的末尾,准备将数据写入对象.它是否正确？有人可以详细了解这里发生了什么吗？有没有办法将指针推回到内存地址的开头重新读取？我知道这样做是不好的编码实践,但我主要只是好奇并希望更多地了解Python的内容.

谢谢!

Answer 1

kal*_*rtt 5

我会尝试回答您关于读者正在做什么以及为什么reset()或seek(0)可能有所帮助的其他问题.在最基本的形式中,csv阅读器可能看起来像这样:

def csv_reader(it):
    for line in it:
        yield line.strip().split(',')

Run Code Online (Sandbox Code Playgroud)

也就是说,它需要任何迭代器生成字符串并为您提供生成器.它所做的只是从你的迭代器中获取一个项目,处理它并返回该项目.当it被消耗时,csv_reader将退出.读者不知道迭代器的来源或如何正确地制作一个新的,所以它甚至都没有尝试重置自己.这留给了程序员.

我们既可以在没有读者知道的情况下修改迭代器,也可以只创建一个新读者.以下是一些证明我的观点的例子.

data = open('data.csv', 'r')
reader = csv.reader(data)

print(next(reader))               # Parse the first line
[next(data) for _ in range(5)]    # Skip the next 5 lines on the underlying iterator
print(next(reader))               # This will be the 7'th line in data
print(reader.line_num)            # reader thinks this is the 2nd line
data.seek(0)                      # Go back to the beginning of the file
print(next(reader))               # gives first line again

data = ['1,2,3', '4,5,6', '7,8,9']
reader = csv.reader(data)         # works fine on lists of strings too
print(next(reader))               # ['1', '2', '3']

Run Code Online (Sandbox Code Playgroud)

一般情况下,如果您需要第二遍,最好关闭/重新打开文件并使用新的csv阅读器.它干净,确保良好的簿记.

归档时间：	11 年前
查看次数：	12125 次
最近记录：	11 年前