读取以分号为分隔符的CSV文件

Abh*_*nty 3 python arrays numpy

我有一个numpy2D数组,其形状(4898, )是每行中的元素用分号分隔,但仍存储在单列而不是多列中(期望的结果)。如何在2D数组的每个数组中每次出现分号时创建拆分。我已经编写了以下Python脚本来这样做,但是会引发错误。

stochastic_gradient_descent_winequality.py

import numpy
import pandas

if __name__ == '__main__' :

    with open('winequality-white.csv', 'r') as f_0 :
        with open('winequality-white-updated.csv', 'w') as f_1 :
            f_0.next()
            for line in f_0 :
                f_1.write(line)


    wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
    wine_data_ = wine_data
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)

    print (numpy.shape(wine_data))
Run Code Online (Sandbox Code Playgroud)

失误

Traceback (most recent call last):
  File "stochastic_gradient_descent_winequality.py", line 16, in <module>
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)
AttributeError: 'numpy.int64' object has no attribute 'split'
Run Code Online (Sandbox Code Playgroud)

Ary*_*thy 5

如果您使用分号(;)而不是逗号()作为csv文件分隔符,,则可以调整第一行:

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ';', header = None)
Run Code Online (Sandbox Code Playgroud)

您的列表理解的问题是[x.split(';') for x in wine_data_]遍历列名

在这种情况下,您无需对列表进行理解。您可以读入数据并完成操作。

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
print (numpy.shape(wine_data))
Run Code Online (Sandbox Code Playgroud)