ima*_*dei 3 excel parsing vba rtf
我有约.excel中包含RTF的12000个单元格(包括格式化标签).我需要解析它们以获取未格式化的文本.
这是带有文本的其中一个单元格的示例:
{\rtf1\ansi\deflang1060\ftnbj\uc1
{\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss \fcharset238
Arial;}}
{\colortbl ;\red255\green255\blue255 ;\red0\green0\blue0 ;}
{\stylesheet{\fs24\cf2\cb1 Normal;}{\cs1\cf2\cb1 Default Paragraph Font;}}
\paperw11908\paperh16833\margl1800\margr1800\margt1440\margb1440\headery720\footery720
\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot
\sectd\pgwsxn11908\pghsxn16833\marglsxn1800\margrsxn1800\margtsxn1440\margbsxn1440
\headery720\footery720\sbkpage\pgncont\pgndec
\plain\plain\f1\fs24\pard TPR 0160 000\par IPR 0160 000\par OB-R-02-28\par}
Run Code Online (Sandbox Code Playgroud)
而我真正需要的是这个:
TPR 0160 000
IPR 0160 000
OB-R-02-28
Run Code Online (Sandbox Code Playgroud)
简单地循环遍历单元格并删除不必要的格式化的问题是,并非这些12000单元格中的所有内容都像这样简单.所以我需要手动检查许多不同的版本并编写几个变体; 并且最后还会有很多手工工作要做.
但是,如果我将一个单元格的内容复制到空文本文档并将其保存为RTF,然后用MS Word打开它,它会立即解析文本,我得到了我想要的内容.不幸的是,对于12000个电池来说这是非常不方便的.
所以我在考虑VBA宏,将单元格内容移动到Word,强制解析然后将结果复制回原始单元格.不幸的是,我不确定该怎么做.
有人有什么想法吗?还是一种不同的方法?我将非常感谢解决方案或推动正确的方向.
TNX!
如果你确实想沿着使用Word来解析文本的路线,这个功能应该可以帮到你.正如评论所示,您需要引用MS Word对象库.
Function ParseRTF(strRTF As String) As String
Dim wdDoc As Word.Document 'Ref: Microsoft Word 11.0 Object Library'
Dim f As Integer 'Variable to store the file I/O number'
'File path for a temporary .rtf file'
Const strFileTemp = "C:\TempFile_ParseRTF.rtf"
'Obtain the next valid file I/O number'
f = FreeFile
'Open the temp file and save the RTF string in it'
Open strFileTemp For Output As #f
Print #f, strRTF
Close #f
'Open the .rtf file as a Word.Document'
Set wdDoc = GetObject(strFileTemp)
'Read the now parsed text from the Word.Document'
ParseRTF = wdDoc.Range.Text
'Delete the temporary .rtf file'
Kill strFileTemp
'Close the Word connection'
wdDoc.Close False
Set wdDoc = Nothing
End Function
Run Code Online (Sandbox Code Playgroud)
您可以使用类似于此的内容为您的12,000个单元格中的每一个调用它:
Sub ParseAllRange()
Dim rngCell As Range
Dim strRTF As String
For Each rngCell In Range("A1:A12000")
'Parse the cell contents'
strRTF = ParseRTF(CStr(rngCell))
'Output to the cell one column over'
rngCell.Offset(0, 1) = strRTF
Next
End Sub
Run Code Online (Sandbox Code Playgroud)
ParseRTF功能大约需要一秒钟的时间(至少在我的机器上),因此对于12,000个单元,这将在大约三个半小时后运行.
在周末考虑过这个问题后,我确信有一个更好(更快)的解决方案.
我记得剪贴板的RTF功能,并意识到可以创建一个类,将RTF数据复制到剪贴板,粘贴到word文档,并输出生成的纯文本.这个解决方案的好处是不必为每个rtf字符串打开和关闭单词doc对象; 它可以在循环之前打开并在之后关闭.
以下是实现此目的的代码.它是一个名为clsRTFParser的Class模块.
Private Declare Function GlobalAlloc Lib "kernel32" _
(ByVal wFlags&, ByVal dwBytes As Long) As Long
Private Declare Function GlobalLock Lib "kernel32" _
(ByVal hMem As Long) As Long
Private Declare Function GlobalUnlock Lib "kernel32" _
(ByVal hMem As Long) As Long
Private Declare Function lstrcpy Lib "kernel32" _
(ByVal lpString1 As Any, ByVal lpString2 As Any) As Long
Private Declare Function OpenClipboard Lib "user32" _
(ByVal Hwnd As Long) As Long
Private Declare Function EmptyClipboard Lib "user32" () As Long
Private Declare Function RegisterClipboardFormat Lib "user32" Alias _
"RegisterClipboardFormatA" (ByVal lpString As String) As Long
Private Declare Function SetClipboardData Lib "user32" _
(ByVal wFormat As Long, ByVal hMem As Long) As Long
Private Declare Function CloseClipboard Lib "user32" () As Long
'---'
Dim wdDoc As Word.Document 'Ref: Microsoft Word 11.0 Object Library'
Private Sub Class_Initialize()
Set wdDoc = New Word.Document
End Sub
Private Sub Class_Terminate()
wdDoc.Close False
Set wdDoc = Nothing
End Sub
'---'
Private Function CopyRTF(strCopyString As String) As Boolean
Dim hGlobalMemory As Long
Dim lpGlobalMemory As Long
Dim hClipMemory As Long
Dim lngFormatRTF As Long
'Allocate and copy string to memory'
hGlobalMemory = GlobalAlloc(&H42, Len(strCopyString) + 1)
lpGlobalMemory = GlobalLock(hGlobalMemory)
lpGlobalMemory = lstrcpy(lpGlobalMemory, strCopyString)
'Unlock the memory and then copy to the clipboard'
If GlobalUnlock(hGlobalMemory) = 0 Then
If OpenClipboard(0&) <> 0 Then
Call EmptyClipboard
'Save the data as Rich Text Format'
lngFormatRTF = RegisterClipboardFormat("Rich Text Format")
hClipMemory = SetClipboardData(lngFormatRTF, hGlobalMemory)
CopyRTF = CBool(CloseClipboard)
End If
End If
End Function
'---'
Private Function PasteRTF() As String
Dim strOutput As String
'Paste the clipboard data to the wdDoc and read the plain text result'
wdDoc.Range.Paste
strOutput = wdDoc.Range.Text
'Get rid of the new lines at the beginning and end of the document'
strOutput = Left(strOutput, Len(strOutput) - 2)
strOutput = Right(strOutput, Len(strOutput) - 2)
PasteRTF = strOutput
End Function
'---'
Public Function ParseRTF(strRTF As String) As String
If CopyRTF(strRTF) Then
ParseRTF = PasteRTF
Else
ParseRTF = "Error in copying to clipboard"
End If
End Function
Run Code Online (Sandbox Code Playgroud)
您可以使用类似于此的内容为您的12,000个单元格中的每一个调用它:
Sub CopyParseAllRange()
Dim rngCell As Range
Dim strRTF As String
'Create new instance of clsRTFParser'
Dim RTFParser As clsRTFParser
Set RTFParser = New clsRTFParser
For Each rngCell In Range("A1:A12000")
'Parse the cell contents'
strRTF = RTFParser.ParseRTF(CStr(rngCell))
'Output to the cell one column over'
rngCell.Offset(0, 1) = strRTF
Next
End Sub
Run Code Online (Sandbox Code Playgroud)
我在我的机器上使用示例RTF字符串模拟了这个.对于12,000个电池,花费了两分半钟,这是一个更合理的时间框架!
| 归档时间: |
|
| 查看次数: |
27878 次 |
| 最近记录: |