我有一个包含120万条文本记录的csv文件.字母数字字段用引号括起来,日期/时间或数字字段不是.
例如"Fred","Smith",01/07/1967,2,"7,The High Street","Anytown","Anycounty","LS1 7AA"
我想做的是在Excel中编写一些VBA(或多或少是我可以使用的唯一可用的工具),它通过记录读取CSV记录,执行检查(如同在最后一个字段中发生的那样,邮政编码)然后将1.2m记录的一小部分输出到新的输出文件.
我理解如何打开这两个文件,读取记录,做我需要做的数据并写出来(我只输出带有前缀表示异常类型的输入记录)
我不知道的是如何正确解析VBA中的CSV.我无法进行简单的文本扫描并搜索逗号,因为文本有时会有逗号(因此文本字段为文本分隔的原因)
是否有一个很棒的命令可以让我从记录中的第n个字段快速获取数据?
我想要的是s_work = field(s_input_record,5)其中5是我的CSV中的字段编号....
非常感谢,C
以下代码应该可以解决问题.我没有在我面前使用Excel,所以我没有测试它,但概念是合理的.
如果这最终太慢,我们可以考虑提高效率的方法.
Sub SelectSomeRecords()
Dim testLine As String
Open inputFileName For Input As #1
Open outputFileName For Output As #2
While Not EOF(1)
Line Input #1, testLine
If RecordIsInteresting(testLine) Then
Print #2, testLine
End If
Wend
Close #1
Close #2
End Sub
Function RecordIsInteresting(recordLine As String) As Boolean
Dim lineItems(1 to 8) As String
GetRecordItems(lineItems(), recordLine)
''// do your custom checking here:
RecordIsInteresting = lineItems(8) = "LS1 7AA"
End Function
Sub GetRecordItems(items() As String, recordLine as String)
Dim finishString as Boolean
Dim itemString as String
Dim itemIndex as Integer
Dim charIndex as Long
Dim inQuote as Boolean
Dim testChar as String
inQuote = False
charIndex = 1
itemIndex = 1
itemString = ""
finishString = False
While charIndex <= Len(recordLine)
testChar = Mid$(recordLine, charIndex, 1)
finishString = False
If inQuote Then
If testChar = Chr$(34) Then
inQuote = False
finishString = True
charIndex = charIndex + 1 ''// ignore the next comma
Else
itemString = itemString + testChar
End If
Else
If testChar = Chr$(34) Then
inQuote = True
ElseIf testChar = "," Then
finishString = True
Else
itemString = itemString + testChar
End If
End If
If finishString Then
items(itemIndex) = itemString
itemString = ""
itemIndex = itemIndex + 1
End If
charIndex = charIndex + 1
Wend
End Sub
Run Code Online (Sandbox Code Playgroud)
VBScript怎么样,虽然这也适用于Excel:
Set cn = CreateObject("ADODB.Connection")
'Note HDR=Yes, that is, first row contains field names '
'and FMT delimted, ie CSV '
strCon="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Docs\;" _
& "Extended Properties=""text;HDR=Yes;FMT=Delimited"";"
cn.open strcon
'You would not need delimiters ('') if last field is numeric: '
strSQL="SELECT FieldName1, FieldName2 INTO New.csv FROM Old.csv " _
& " WHERE LastFieldName='SomeTextValue'"
'Creates new csv file
cn.Execute strSQL
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
39607 次 |
| 最近记录: |