我有一个视觉上看起来不会破碎的CSV.其中一列包含完整的电子邮件和随后的其他逗号.格式如下:
ID | Info | Email | Notes
--------------------------------------------------
1234 | Sample | Full email here,| More notes here
| and email wraps.|
--------------------------------------------------
5678 | Sample2| Another email, | More notes
--------------------------------------------------
9011 | Sample3| More emails | Etc.
--------------------------------------------------
Run Code Online (Sandbox Code Playgroud)
我正在使用CSV读取器,它将每个新行输出为新行,但它不正确.例如,我得到:
Line 1: 1234 | Sample | Full email here,| More notes here
Line 2: | and email wraps.|
Line 3: 5678 | Sample2| Another email, | More notes
Line 4: 9011 | Sample3| More emails | Etc.
Run Code Online (Sandbox Code Playgroud)
我需要它能够像Excel或Libre Office那样识别单元分隔符,并得到:
Line 1: 1234 | Sample | Full email here, and email wraps.| More notes here
Line 2: 5678 | Sample2| Another email, | More notes
Line 3: 9011 | Sample3| More emails | Etc.
Run Code Online (Sandbox Code Playgroud)
我有这个代码:
import csv
import sys
csv.field_size_limit(sys.maxsize)
file = "myfile.csv"
with open(file, 'rU') as f:
freader = csv.reader(f, delimiter = '|', quoting=csv.QUOTE_NONE)
for row in freader:
print(','.join(row))
Run Code Online (Sandbox Code Playgroud)
我尝试了delimiter =','或delimiter ='\n'但没有运气.有任何想法吗?
小智 8
CSV代表逗号分隔值.虽然可以将分隔符更改为制表符,管道或任何您想要的内容,但事实上CSV是一种非常原始的基于行的格式.
问题在于您的第二条记录,该记录跨越了从CSV文件的角度打破的行.Python CSV库不是为了容纳这样的东西而设计的,因为它不是CSV文件的样式.
要做你想要的,最好编写自己的解析器,打破分隔符上的每一行,并根据一些逻辑进行合并.如果 ID列永远不会跨越两行,这应该是相对微不足道的.
至于如何实际编写代码,您需要一个如下过程:
Initialise array X
Read each line L of file F:
If the ID field is empty then merge each entry into the previous line L-1
Otherwise append the line L to array X
Run Code Online (Sandbox Code Playgroud)