使用pandas在CSV文件中写评论

Question

使用pandas在CSV文件中写评论

Mat*_*ois 19 python export-to-csv pandas

我想在我创建的CSV文件中写一些注释pandas.我没有在标准模块中找到任何选项DataFrame.to_csv(即使read_csv可以跳过注释)csv.我可以打开文件,写下注释(以#开头的行#),然后将其传递给to_csv.有没有更好的选择？

Answer 1

Vor*_*Vor 29

df.to_csv接受文件对象.因此,您可以在a模式下打开文件,写下注释并将其传递给dataframe to_csv函数.

例如:

In [36]: df = pd.DataFrame({'a':[1,2,3], 'b':[1,2,3]})

In [37]: f = open('foo', 'a')

In [38]: f.write('# My awesome comment\n')

In [39]: f.write('# Here is another one\n')

In [40]: df.to_csv(f)

In [41]: f.close()

In [42]: more foo
# My awesome comment
# Here is another one
,a,b
0,1,1
1,2,2
2,3,3

Run Code Online (Sandbox Code Playgroud)

可能是因为这是 2015 年的响应，和 2020 年的 pandas 版本略有不同，但当前的答案在每行之间输出一个空行。下面的代码没有空行。您还可以更自由地使用 pandas 的 mode='a' 进行追加。f = open(path, 'a') f.write('comment\n') f.close() df.to_csv(path, mode='a') (5认同)

Answer 2

joe*_*lom 6

@Vor 的解决方案的另一种方法是先将注释写入文件，然后使用mode='a'withto_csv()将数据框的内容添加到同一文件中。根据我的基准测试（如下），这与以追加模式打开文件、添加注释然后将文件处理程序传递给熊猫一样长（根据@Vor 的回答）。类似的时序意义考虑到这是在内部做（什么大熊猫DataFrame.to_csv()调用CSVFormatter.save()，它使用_get_handles() 的读取文件通过open()。

另外，通过with语句使用文件 IO 很方便，可确保打开的文件在您完成处理后关闭并离开该with语句。请参阅以下基准中的示例。

读入测试数据

import pandas as pd
# Read in the iris data frame from the seaborn GitHub location
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
# Create a bigger data frame
while iris.shape[0] < 100000:
    iris = iris.append(iris)
# `iris.shape` is now (153600, 5)

Run Code Online (Sandbox Code Playgroud)

1.追加相同的文件处理程序

%%timeit -n 5 -r 5

# Open a file in append mode to add the comment
# Then pass the file handle to pandas
with open('test1.csv', 'a') as f:
    f.write('# This is my comment\n')
    iris.to_csv(f)

Run Code Online (Sandbox Code Playgroud)

每个循环 972 ms ± 31.9 ms（5 次运行的平均值 ± 标准偏差，每次 5 次循环）

2. 重新打开文件 `to_csv(mode='a')`

%%timeit -n 5 -r 5

# Open a file in write mode to add the comment
# Then close the file and reopen it with pandas in append mode
with open('test2.csv', 'w') as f:
    f.write('# This is my comment\n')
iris.to_csv('test2.csv', mode='a')

Run Code Online (Sandbox Code Playgroud)

每个循环 949 ms ± 19.3 ms（5 次运行的平均值 ± 标准偏差，每次 5 次循环）

归档时间：	10 年，8 月前
查看次数：	6388 次
最近记录：	8 年，1 月前

使用pandas在CSV文件中写评论

读入测试数据

1.追加相同的文件处理程序

2. 重新打开文件 to_csv(mode='a')

2. 重新打开文件 `to_csv(mode='a')`