我试图将文件夹中的所有csv文件合并为一个大型csv文件.我还需要向这个合并的csv添加一个新列,它显示每行来自的原始文件.这是我到目前为止的代码:
import csv
import glob
read_files = glob.glob("*.csv")
source = []
with open("combined.files.csv", "wb") as outfile:
for f in read_files:
source.append(f)
with open(f, "rb") as infile:
outfile.write(infile.read())
Run Code Online (Sandbox Code Playgroud)
我知道我必须以某种方式重复每个f为每个csv中的行数,然后将其作为新列附加到.write命令,但我不知道如何执行此操作.谢谢大家!
如果将文件名添加为最后一列,则根本不需要解析csv.只需逐行阅读,添加文件名并写入.并且不要以二进制模式打开!
import glob
import os
out_filename = "combined.files.csv"
if os.path.exists(out_filename):
os.remove(out_filename)
read_files = glob.glob("*.csv")
with open(out_filename, "w") as outfile:
for filename in read_files:
with open(filename) as infile:
for line in infile:
outfile.write('{},{}\n'.format(line.strip(), filename))
Run Code Online (Sandbox Code Playgroud)
如果你的csv有一个共同的标题行,选择一个写入outfile并压制其余的
import os
import glob
want_header = True
out_filename = "combined.files.csv"
if os.path.exists(out_filename):
os.remove(out_filename)
read_files = glob.glob("*.csv")
with open(out_filename, "w") as outfile:
for filename in read_files:
with open(filename) as infile:
if want_header:
outfile.write('{},Filename\n'.format(next(infile).strip()))
want_header = False
else:
next(infile)
for line in infile:
outfile.write('{},{}\n'.format(line.strip(), filename))
Run Code Online (Sandbox Code Playgroud)