Python CSV阅读器,CSV格式

1 python csv excel

我有一个视觉上看起来不会破碎的CSV.其中一列包含完整的电子邮件和随后的其他逗号.格式如下:

ID   | Info   |  Email           | Notes
--------------------------------------------------
1234 | Sample |  Full email here,| More notes here
              |  and email wraps.|
--------------------------------------------------
5678 | Sample2|  Another email,  |  More notes
--------------------------------------------------
9011 | Sample3|  More emails     |  Etc.
--------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

我正在使用CSV读取器,它将每个新行输出为新行,但它不正确.例如,我得到:

Line 1: 1234 | Sample |  Full email here,| More notes here
Line 2:               |  and email wraps.|
Line 3: 5678 | Sample2|  Another email,  |  More notes
Line 4: 9011 | Sample3|  More emails     |  Etc.
Run Code Online (Sandbox Code Playgroud)

我需要它能够像Excel或Libre Office那样识别单元分隔符,并得到:

Line 1: 1234 | Sample |  Full email here, and email wraps.| More notes here
Line 2: 5678 | Sample2|  Another email,  |  More notes
Line 3: 9011 | Sample3|  More emails     |  Etc.
Run Code Online (Sandbox Code Playgroud)

我有这个代码:

 import csv
 import sys
 csv.field_size_limit(sys.maxsize)
 file = "myfile.csv"
 with open(file, 'rU') as f:
     freader = csv.reader(f, delimiter = '|', quoting=csv.QUOTE_NONE)
     for row in freader:
         print(','.join(row))
Run Code Online (Sandbox Code Playgroud)

我尝试了delimiter =','或delimiter ='\n'但没有运气.有任何想法吗?

小智 8

CSV代表逗号分隔值.虽然可以将分隔符更改为制表符,管道或任何您想要的内容,但事实上CSV是一种非常原始的基于行的格式.

问题在于您的第二条记录,该记录跨越从CSV文件的角度打破的行.Python CSV库不是为了容纳这样的东西而设计的,因为它不是CSV文件的样式.

要做你想要的,最好编写自己的解析器,打破分隔符上的每一行,并根据一些逻辑进行合并.如果 ID列永远不会跨越两行,这应该是相对微不足道的.

至于如何实际编写代码,您需要一个如下过程:

Initialise array X
Read each line L of file F:
    If the ID field is empty then merge each entry into the previous line L-1
    Otherwise append the line L to array X
Run Code Online (Sandbox Code Playgroud)