小编Act*_*ary的帖子

使用Python根据特定列拆分CSV文件

我是Python初学者，并且已经编写了一些基本脚本。我的最新挑战是根据每行中特定变量的值，将一个非常大的csv文件（10gb +）分成多个较小的文件。

例如，该文件可能如下所示：

Category,Title,Sales
"Books","Harry Potter",1441556
"Books","Lord of the Rings",14251154
"Series", "Breaking Bad",6246234
"Books","The Alchemist",12562166
"Movie","Inception",1573437

Run Code Online (Sandbox Code Playgroud)

我想将文件拆分为单独的文件：Books.csv，Series.csv，Movie.csv

实际上，将有数百种类别，并且不会对其进行排序。在这种情况下，它们位于第一列，但将来可能不在。

我在网上找到了一些解决方案，但是在Python中却没有。有一个非常简单的AWK命令可以在一行中完成此操作，但是我无法在工作中访问AWK。

我编写了以下有效的代码，但我认为它可能效率很低。有人可以建议如何加快速度吗？

import csv

#Creates empty set - this will be used to store the values that have already been used
filelist = set()

#Opens the large csv file in "read" mode
with open('//directory/largefile', 'r') as csvfile:

    #Read the first row of the large file and store the whole row as a string (headerstring)
    read_rows = csv.reader(csvfile)
    headerrow = next(read_rows)
    headerstring=','.join(headerrow) 

    for …

Run Code Online (Sandbox Code Playgroud)

python csv

Act*_*ary

2017 10-20

4
推荐指数

2
解决办法

2076
查看次数

用字符串周围的引号编写 csv (Python)

我编写了以下代码来获取一个大型 csv 文件，并根据列中的特定单词将其拆分为多个 csv 文件。原始 csv 文件有一些是字符串的字段，它们周围有引号。

例如：

Field1,Field2,Field3,Field4
1,2,"red",3
1,4,"red",4
3,4,"blue",4

Run Code Online (Sandbox Code Playgroud)

等等。

我的代码基于 Field4 将文件拆分为单独的 csv。

我的输出如下所示：

3.csv
Field1,Field2,Field3,Field4
1,2,red,3

4.csv
Field1,Field2,Field3,Field4
1,4,red,4
3,4,blue,4

Run Code Online (Sandbox Code Playgroud)

我希望我的输出保留字段 3 中字符串周围的引号。这些文件被输入到一个软件中，该软件仅在字符串周围有引号时才有效，这很烦人。

我当前的代码如下所示：

import csv

#Creates empty set - this will be used to store the values that have already been used
newfilelist = set()

#Opens the large csv file in "read" mode
with open('File.csv', 'r') as csvfile:
    
    #Read the first row of the large file and store the whole row as a string …

Run Code Online (Sandbox Code Playgroud)

python csv quotes parsing text

Act*_*ary

2020 09-21

4
推荐指数

1
解决办法

3885
查看次数