MeR*_*uud 7 pdf excel vba filesystemobject
我试图用vba从pdf文件中提取表格并将它们导出到excel.如果一切都按照应有的方式进行,它应该全部自动完成.问题是表格没有标准化.
这就是我到目前为止所拥有的.
和代码:
With New Scripting.FileSystemObject
With .OpenTextFile(strFileName, 1, False, 0)
If Not .AtEndOfStream Then .SkipLine
Do Until .AtEndOfStream
//do something
Loop
End With
End With
Run Code Online (Sandbox Code Playgroud)
一切都很好.但现在我遇到了从文本文件中提取表格的问题.我想要做的是VBA找到一个字符串,例如"年收入",然后将数据输出到列中.(直到桌子结束.)
第一部分并不是很困难(找到某个字符串),但我将如何处理第二部分.文本文件看起来像这个Pastebin.问题是文本没有标准化.因此,例如,一些表具有3年列(2010 2011 2012),而一些表仅有两个(或1),一些表在列之间具有更多空格,而一些表不包括某些行(例如Capital Asset,net).
我正在考虑做这样的事情,但不确定如何在VBA中进行.
我将我的第一个版本基于Pdf进行优秀,但在网上阅读的人并不推荐OpenFile,而是FileSystemObject(尽管看起来速度要慢很多).
有什么指示让我开始,主要是在第2步?
您可以使用多种方法来剖析文本文件,并且根据文件的复杂程度,您可能会倾向于采用一种或另一种方式。我开始了这个,但它有点失控了……享受吧。
根据您提供的示例和附加评论,我注意到以下内容。其中一些可能适用于简单文件,但对于更大、更复杂的文件可能会变得笨拙。此外,我在这里使用的方法或技巧可能稍微更有效,但这肯定会让您达到预期的结果。希望这与提供的代码结合起来有意义:
InStr在当前行上使用,通过查找文本“表格”来确定您位于表格中,然后一旦您知道您位于表格中,文件的“表”部分开始查找“资产”部分等Split函数和一个循环将完成这项工作。以下代码将从文本文件中提取资产详细信息,您可以对其进行修改以提取其他部分。它应该处理多行。希望我已经足够评论了。看一下,如果需要进一步帮助,我会进行编辑。
Sub ReadInTextFile()
Dim fs As Scripting.FileSystemObject, fsFile As Scripting.TextStream
Dim sFileName As String, sLine As String, vYears As Variant
Dim iNoColumns As Integer, ii As Integer, iCount As Integer
Dim bIsTable As Boolean, bIsAssets As Boolean, bIsLiabilities As Boolean, bIsNetAssets As Boolean
Set fs = CreateObject("Scripting.FileSystemObject")
sFileName = "G:\Sample.txt"
Set fsFile = fs.OpenTextFile(sFileName, 1, False)
'Loop through the file as you've already done
Do While fsFile.AtEndOfStream <> True
'Determine flag positions in text file
sLine = fsFile.Readline
Debug.Print VBA.Len(sLine)
'Always skip empty lines (including single spaceS)
If VBA.Len(sLine) > 1 Then
'We've found a new table so we can reset the booleans
If VBA.InStr(1, sLine, "Table") > 0 Then
bIsTable = True
bIsAssets = False
bIsNetAssets = False
bIsLiabilities = False
iNoColumns = 0
End If
'Perhaps you want to also have some sort of way to designate that a table has finished. Like so
If VBA.Instr(1, sLine, "Some text that designates the end of the table") Then
bIsTable = False
End If
'If we're in the table section then we want to read in the data
If bIsTable Then
'Check for your different sections. You could make this constant if your text file allowed it.
If VBA.InStr(1, sLine, "Assets") > 0 And VBA.InStr(1, sLine, "Net") = 0 Then bIsAssets = True: bIsLiabilities = False: bIsNetAssets = False
If VBA.InStr(1, sLine, "Liabilities") > 0 Then bIsAssets = False: bIsLiabilities = True: bIsNetAssets = False
If VBA.InStr(1, sLine, "Net Assests") > 0 Then bIsAssets = True: bIsLiabilities = False: bIsNetAssets = True
'If we haven't triggered any of these booleans then we're at the column headings
If Not bIsAssets And Not bIsLiabilities And Not bIsNetAssets And VBA.InStr(1, sLine, "Table") = 0 Then
'Trim the current line to remove leading and trailing spaces then use the split function to determine the number of years
vYears = VBA.Split(VBA.Trim$(sLine), " ")
For ii = LBound(vYears) To UBound(vYears)
If VBA.Len(vYears(ii)) > 0 Then iNoColumns = iNoColumns + 1
Next ii
'Now we can redefine some variables to hold the information (you'll want to redim after you've collected the info)
ReDim sAssets(1 To iNoColumns + 1, 1 To 100) As String
ReDim iColumns(1 To iNoColumns) As Integer
Else
If bIsAssets Then
'Skip the heading line
If Not VBA.Trim$(sLine) = "Assets" Then
'Increment the counter
iCount = iCount + 1
'If iCount reaches it's limit you'll have to redim preseve you sAssets array (I'll leave this to you)
If iCount > 99 Then
'You'll find other posts on stackoverflow to do this
End If
'This will happen on the first row, it'll happen everytime you
'hit a $ sign but you could code to only do so the first time
If VBA.InStr(1, sLine, "$") > 0 Then
iColumns(1) = VBA.InStr(1, sLine, "$")
For ii = 2 To iNoColumns
'We need to start at the next character across
iColumns(ii) = VBA.InStr(iColumns(ii - 1) + 1, sLine, "$")
Next ii
End If
'The first part (the name) is simply up to the $ sign (trimmed of spaces)
sAssets(1, iCount) = VBA.Trim$(VBA.Mid$(sLine, 1, iColumns(1) - 1))
For ii = 2 To iNoColumns
'Then we can loop around for the rest
sAssets(ii, iCount) = VBA.Trim$(VBA.Mid$(sLine, iColumns(ii) + 1, iColumns(ii) - iColumns(ii - 1)))
Next ii
'Now do the last column
If VBA.Len(sLine) > iColumns(iNoColumns) Then
sAssets(iNoColumns + 1, iCount) = VBA.Trim$(VBA.Right$(sLine, VBA.Len(sLine) - iColumns(iNoColumns)))
End If
Else
'Reset the counter
iCount = 0
End If
End If
End If
End If
End If
Loop
'Clean up
fsFile.Close
Set fsFile = Nothing
Set fs = Nothing
End Sub
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
24690 次 |
| 最近记录: |