我知道这个问题花了很多时间,但有不同的答案; 我很迷惑.
我的行是:
1,3.2,BCD,"qwer 47"" ""dfg""",1
Run Code Online (Sandbox Code Playgroud)
可选的引用和双引号MS Excel标准.(数据:qwer 47" "dfg"表示如下"qwer 47"" ""dfg""".)
我需要一个正则表达式.
OK,你已经从注释中看到,正则表达式是如此不这样做的正确的工具.但如果你坚持,这里是:
这个正则表达式将在Java(或.NET和其他支持占有量词和冗长正则表达式的实现)中起作用:
^ # Start of string
(?: # Match the following:
(?: # Either match
[^",\n]*+ # 0 or more characters except comma, quote or newline
| # or
" # an opening quote
(?: # followed by either
[^"]*+ # 0 or more non-quote characters
| # or
"" # an escaped quote ("")
)* # any number of times
" # followed by a closing quote
) # End of alternation
, # Match a comma (separating the CSV columns)
)* # Do this zero or more times.
(?: # Then match
(?: # using the same rules as above
[^",\n]*+ # an unquoted CSV field
| # or a quoted CSV field
"(?:[^"]*+|"")*"
) # End of alternation
) # End of non-capturing group
$ # End of string
Run Code Online (Sandbox Code Playgroud)
Java代码:
boolean foundMatch = subjectString.matches(
"(?x)^ # Start of string\n" +
"(?: # Match the following:\n" +
" (?: # Either match\n" +
" [^\",\\n]*+ # 0 or more characters except comma, quote or newline\n" +
" | # or\n" +
" \" # an opening quote\n" +
" (?: # followed by either\n" +
" [^\"]*+ # 0 or more non-quote characters\n" +
" | # or\n" +
" \"\" # an escaped quote (\"\")\n" +
" )* # any number of times\n" +
" \" # followed by a closing quote\n" +
" ) # End of alternation\n" +
" , # Match a comma (separating the CSV columns)\n" +
")* # Do this zero or more times.\n" +
"(?: # Then match\n" +
" (?: # using the same rules as above\n" +
" [^\",\\n]*+ # an unquoted CSV field\n" +
" | # or a quoted CSV field\n" +
" \"(?:[^\"]*+|\"\")*\"\n" +
" ) # End of alternation\n" +
") # End of non-capturing group\n" +
"$ # End of string");
Run Code Online (Sandbox Code Playgroud)
请注意,您不能假设CSV文件中的每一行都是完整的行.您可以在CSV行中包含换行符(只要包含换行符的列用引号括起来).这个正则表达式知道这一点,但如果你只给它一个部分行,它就会失败.这是您真正需要CSV解析器来验证CSV文件的另一个原因.这就是解析器的作用.如果您控制输入并且知道在CSV字段中永远不会有换行符,那么您可能会放弃它,但只有这样.
| 归档时间: |
|
| 查看次数: |
13732 次 |
| 最近记录: |