我的CSV中有这样的一行:
"Samsung U600 24"","10000003409","1","10000003427"
旁边的引用24用于表示英寸,而该引号旁边的引号将关闭该字段.我正在阅读该行fgetcsv但解析器出错并将值读取为:
Samsung U600 24",10000003409"
我尝试在英寸引号前加一个反斜杠,但后来我只是在名字中得到一个反斜杠:
Samsung U600 24\"
有没有办法在CSV中正确地逃避这个,所以值是Samsung U600 24",或者我必须在处理器中正则它?
use*_*035 251
使用2个引号:
"Samsung U600 24"""
Run Code Online (Sandbox Code Playgroud)
Ang*_*dar 17
不仅是双引号,还需要单引号('),双引号("),反斜杠(\)和NUL(NULL字节).
使用fputcsv()写,fgetcsv()读,将采取一切照顾.
理论上,CSV 是一种简单的格式(用逗号分隔的表格数据),但遗憾的是没有正式的规范,因此有许多细微不同的实现。导入/导出时需要小心。我将引用 RFC 4180 来实现常见的实现:
2. Definition of the CSV Format
While there are various specifications and implementations for the
CSV format (for ex. [4], [5], [6] and [7]), there is no formal
specification in existence, which allows for a wide variety of
interpretations of CSV files. This section documents the format that
seems to be followed by most implementations:
1. Each record is located on a separate line, delimited by a line
break (CRLF). For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
2. The last record in the file may or may not have an ending line
break. For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx
3. There maybe an optional header line appearing as the first line
of the file with the same format as normal record lines. This
header will contain names corresponding to the fields in the file
and should contain the same number of fields as the records in
the rest of the file (the presence or absence of the header line
should be indicated via the optional "header" parameter of this
MIME type). For example:
field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
4. Within the header and each record, there may be one or more
fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part
of a field and should not be ignored. The last field in the
record must not be followed by a comma. For example:
aaa,bbb,ccc
5. Each field may or may not be enclosed in double quotes (however
some programs, such as Microsoft Excel, do not use double quotes
at all). If fields are not enclosed with double quotes, then
double quotes may not appear inside the fields. For example:
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
6. Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
Run Code Online (Sandbox Code Playgroud)
所以通常
""在原始 CSV 字段中表示空字符串,""""在原始 CSV 中表示单引号,"。(通常不是问题:CRLF(Windows 风格)或 LF(Unix 风格)换行符;最后一行是否以换行符结束)
但是,您可能会遇到使用转义字符(如 )转义引号或其他字符(分隔符、换行符、转义字符本身)的数据\。例如,在 readr's 中read_csv(),这是由escape_double和控制的escape_backslash。一些不寻常的数据使用注释字符,例如#(R 中默认read.table但不是read.csv)。