解析CSV,在VBA中忽略字符串文字中的逗号?

Tom*_*Tom 17 csv excel ms-access vba split

我有一个每天运行的VBA应用程序.它会检查自动下载CSV的文件夹,并将其内容添加到数据库中.在解析它们时,我意识到某些值将逗号作为其名称的一部分.这些值包含在字符串文字中.

所以我试图弄清楚如何解析这个CSV并忽略字符串文字中包含的逗号.例如...

1,2,3,"This should,be one part",5,6,7 Should return 

1
2
3
"This should,be one part"
5
6
7
Run Code Online (Sandbox Code Playgroud)

我一直在使用VBA的split()函数,因为我不想重新发明轮子,但如果我不得不想我会做其他的事情.

任何建议,将不胜感激.

kb_*_*sou 14

解决这个问题的第一种方法是从csv文件(int,int,"String literal,最多只有一个逗号"等)查看该行的结构.一个天真的解决方案是(假设该行没有任何分号)

Function splitLine1(line As String) As String()

   Dim temp() As String
   'Splits the line in three. The string delimited by " will be at temp(1)
   temp = Split(line, Chr(34)) 'chr(34) = "

   'Replaces the commas in the numeric fields by semicolons
   temp(0) = Replace(temp(0), ",", ";")
   temp(2) = Replace(temp(2), ",", ";")

   'Joins the temp array with quotes and then splits the result using the semicolons
   splitLine1 = Split(Join(temp, Chr(34)), ";")

End Function
Run Code Online (Sandbox Code Playgroud)

此功能仅解决此特定问题.另一种方法是使用VBScript中的正则表达式对象.

Function splitLine2(line As String) As String()

    Dim regex As Object
    Set regex = CreateObject("vbscript.regexp")
    regex.IgnoreCase = True
    regex.Global = True

    'This pattern matches only commas outside quotes
    'Pattern = ",(?=([^"]*"[^"]*")*(?![^"]*"))"
    regex.Pattern = ",(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"

    'regex.replaces will replace the commas outside quotes with semicolons and then the
    'Split function will split the result based on the semicollons
    splitLine2 = Split(regex.Replace(line, ";"), ";")

End Function
Run Code Online (Sandbox Code Playgroud)

这种方法看起来更加神秘,但并不依赖于线条的结构

您可以在VBScript阅读更多关于正则表达式模式在这里


tra*_*or1 11

@Gimp said...

The current answers do not contain enough detail.

I'm running into the same problem. Looking for more detail in this answer.

To elaborate on @MRAB's answer:

Function ParseCSV(FileName)
    Dim Regex       'As VBScript_RegExp_55.RegExp
    Dim MatchColl   'As VBScript_RegExp_55.MatchCollection
    Dim Match       'As VBScript_RegExp_55.Match
    Dim FS          'As Scripting.FileSystemObject
    Dim Txt         'As Scripting.TextStream
    Dim CSVLine
    ReDim ToInsert(0)

    Set FS = CreateObject("Scripting.FileSystemObject")
    Set Txt = FS.OpenTextFile(FileName, 1, False, -2)
    Set Regex = CreateObject("VBScript.RegExp")

    Regex.Pattern = """[^""]*""|[^,]*"    '<- MRAB's answer
    Regex.Global = True

    Do While Not Txt.AtEndOfStream
        ReDim ToInsert(0)
        CSVLine = Txt.ReadLine
        For Each Match In Regex.Execute(CSVLine)
            If Match.Length > 0 Then
                ReDim Preserve ToInsert(UBound(ToInsert) + 1)
                ToInsert(UBound(ToInsert) - 1) = Match.Value
            End If
        Next
        InsertArrayIntoDatabase ToInsert
    Loop
    Txt.Close
End Function
Run Code Online (Sandbox Code Playgroud)

You need to customize the InsertArrayIntoDatabase Sub for your own table. Mine has several text fields named f00, f01, etc...

Sub InsertArrayIntoDatabase(a())
    Dim rs As DAO.Recordset
    Dim i, n
    Set rs = CurrentDb().TableDefs("tbl").OpenRecordset()
    rs.AddNew
    For i = LBound(a) To UBound(a)
        n = "f" & Format(i, "00") 'fields in table are f00, f01, f02, etc..
        rs.Fields(n) = a(i)
    Next
    rs.Update
End Sub
Run Code Online (Sandbox Code Playgroud)

Note that instead of using CurrentDb() in InsertArrayIntoDatabase(), you should really use a global variable that gets set to the value of CurrentDb() before ParseCSV() runs, because running CurrentDb() in a loop is very slow, especially on a very large file.


MRA*_*RAB 10

假设在引用字段中没有引号,解析CSV行的简单正则表达式是:

"[^"]*"|[^,]*
Run Code Online (Sandbox Code Playgroud)

每场比赛都会返回一个字段.