正则表达式用嵌套引号解析csv

Tru*_*ain -2 regex csv

可能重复:
C#,正则表达式:如何解析逗号分隔值,其中某些值可能引用字符串本身包含逗号
正则表达式来解析csv

我知道这个问题花了很多时间,但有不同的答案; 我很迷惑.

我的行是:

1,3.2,BCD,"qwer 47"" ""dfg""",1
Run Code Online (Sandbox Code Playgroud)

可选的引用和双引号MS Excel标准.(数据:qwer 47" "dfg"表示如下"qwer 47"" ""dfg""".)

我需要一个正则表达式.

Tim*_*ker 5

OK,你已经从注释中看到,正则表达式是如此不这样做的正确的工具.但如果你坚持,这里是:

这个正则表达式将在Java(或.NET和其他支持占有量词和冗长正则表达式的实现)中起作用:

^            # Start of string
(?:          # Match the following:
 (?:         #  Either match
  [^",\n]*+  #   0 or more characters except comma, quote or newline
 |           #  or
  "          #   an opening quote
  (?:        #   followed by either
   [^"]*+    #    0 or more non-quote characters
  |          #   or
   ""        #    an escaped quote ("")
  )*         #   any number of times
  "          #   followed by a closing quote
 )           #  End of alternation
 ,           #  Match a comma (separating the CSV columns)
)*           # Do this zero or more times.
(?:          # Then match
 (?:         #  using the same rules as above
  [^",\n]*+  #  an unquoted CSV field
 |           #  or a quoted CSV field
  "(?:[^"]*+|"")*"
 )           #  End of alternation
)            # End of non-capturing group
$            # End of string
Run Code Online (Sandbox Code Playgroud)

Java代码:

boolean foundMatch = subjectString.matches(
    "(?x)^         # Start of string\n" +
    "(?:           # Match the following:\n" +
    " (?:          #  Either match\n" +
    "  [^\",\\n]*+ #   0 or more characters except comma, quote or newline\n" +
    " |            #  or\n" +
    "  \"          #   an opening quote\n" +
    "  (?:         #   followed by either\n" +
    "   [^\"]*+    #    0 or more non-quote characters\n" +
    "  |           #   or\n" +
    "   \"\"       #    an escaped quote (\"\")\n" +
    "  )*          #   any number of times\n" +
    "  \"          #   followed by a closing quote\n" +
    " )            #  End of alternation\n" +
    " ,            #  Match a comma (separating the CSV columns)\n" +
    ")*            # Do this zero or more times.\n" +
    "(?:           # Then match\n" +
    " (?:          #  using the same rules as above\n" +
    "  [^\",\\n]*+ #  an unquoted CSV field\n" +
    " |            #  or a quoted CSV field\n" +
    "  \"(?:[^\"]*+|\"\")*\"\n" +
    " )            #  End of alternation\n" +
    ")             # End of non-capturing group\n" +
    "$             # End of string");
Run Code Online (Sandbox Code Playgroud)

请注意,您不能假设CSV文件中的每一行都是完整的行.您可以在CSV行中包含换行符(只要包含换行符的列用引号括起来).这个正则表达式知道这一点,但如果你只给它一个部分行,它就会失败.这是您真正需要CSV解析器来验证CSV文件的另一个原因.这就是解析器的作用.如果您控制输入并且知道在CSV字段中永远不会有换行符,那么您可能会放弃它,但只有这样.