在大型数据集上使用csv分离参数时出现问题

Aph*_*ity 1 python csv large-data

我有一个大的数据集(OMNI)的工作,我想办法通过数据分析和把每一行数据到一个数组列表.我是Python的新手,所以我正在学习.

这就是我所拥有的:

import Tkinter, tkFileDialog
import csv 

#Choose the file that you want to read from
root = Tkinter.Tk()
root.withdraw()


file_path = tkFileDialog.askopenfilename()
current_file = open(file_path , "r")

#OMNI_2001 = {}

reader = csv.reader(current_file, delimiter= ' ')

output_file = open('newdata.txt','w')
out = csv.writer(output_file)

for row in reader:
    out.writerow(row)
    print row
#print row[0::1]
Run Code Online (Sandbox Code Playgroud)

我读入的一行数据如下所示:

2001 182  0  0 60 60   7   2  71   -695    320  0.22   -173    6.07    5.23    0.46   -2.00    0.69   -1.93    0.38    2.09   331.0  -329.5    24.5    19.8   8.66  101479.  1.90   0.64   2.25   8.0    6.67   29.65    3.55   12.73   -1.78   -0.70   288  -142   146    -3   -22    20    19   0.99
Run Code Online (Sandbox Code Playgroud)

但在我输出后,新数据如下所示:

2001,182,,0,,0,60,60,,,7,,,2,,71,,,-695,,,,320,,0.22,,,-173,,,,6.07,,,,5.23,,,,0.46,,,-2.00,,,,0.69,,,-1.93,,,,0.38,,,,2.09,,,331.0,,-329.5,,,,24.5,,,,19.8,,,8.66,,101479.,,1.90,,,0.64,,,2.25,,,8.0,,,,6.67,,,29.65,,,,3.55,,,12.73,,,-1.78,,,-0.70,,,288,,-142,,,146,,,,-3,,,-22,,,,20,,,,19,,,0.99
Run Code Online (Sandbox Code Playgroud)

我在做什么来引起这么多额外的逗号?我还将如何删除不需要的条目?

unu*_*tbu 8

您的csv文件在项目之间有多个空格.delimiter=' '使读者将每个空格视为分隔新列.这就是行有这么多"额外"列的原因.

使用skipinitialspace = True可以忽略分隔符后面的空格.这将消除虚假的额外列.

import Tkinter, tkFileDialog
import csv 

#Choose the file that you want to read from
root = Tkinter.Tk()
root.withdraw()

file_path = tkFileDialog.askopenfilename()
with open(file_path , 'rb') as current_file:
    reader = csv.reader(current_file, delimiter= ' ', 
                        skipinitialspace=True)
    with open('newdata.txt','wb') as output_file:
        out = csv.writer(output_file)
        for row in reader:
            out.writerow(row)
            print row
            #print row[0::1]
Run Code Online (Sandbox Code Playgroud)