Excel中的格式文本格式(带有格式化标签),用于未格式化的文本

ima*_*dei 3 excel parsing vba rtf

我有约.excel中包含RTF的12000个单元格(包括格式化标签).我需要解析它们以获取未格式化的文本.

这是带有文本的其中一个单元格的示例:

{\rtf1\ansi\deflang1060\ftnbj\uc1
{\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss \fcharset238
Arial;}}
{\colortbl ;\red255\green255\blue255 ;\red0\green0\blue0 ;}
{\stylesheet{\fs24\cf2\cb1 Normal;}{\cs1\cf2\cb1 Default Paragraph Font;}}
\paperw11908\paperh16833\margl1800\margr1800\margt1440\margb1440\headery720\footery720
\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot
\sectd\pgwsxn11908\pghsxn16833\marglsxn1800\margrsxn1800\margtsxn1440\margbsxn1440
\headery720\footery720\sbkpage\pgncont\pgndec
\plain\plain\f1\fs24\pard TPR 0160 000\par IPR 0160 000\par OB-R-02-28\par}
Run Code Online (Sandbox Code Playgroud)

而我真正需要的是这个:

TPR 0160 000
IPR 0160 000
OB-R-02-28
Run Code Online (Sandbox Code Playgroud)

简单地循环遍历单元格并删除不必要的格式化的问题是,并非这些12000单元格中的所有内容都像这样简单.所以我需要手动检查许多不同的版本并编写几个变体; 并且最后还会有很多手工工作要做.

但是,如果我将一个单元格的内容复制到空文本文档并将其保存为RTF,然后用MS Word打开它,它会立即解析文本,我得到了我想要的内容.不幸的是,对于12000个电池来说这是非常不方便的.

所以我在考虑VBA宏,将单元格内容移动到Word,强制解析然后将结果复制回原始单元格.不幸的是,我不确定该怎么做.

有人有什么想法吗?还是一种不同的方法?我将非常感谢解决方案或推动正确的方向.

TNX!

Nos*_*dge 7

如果你确实想沿着使用Word来解析文本的路线,这个功能应该可以帮到你.正如评论所示,您需要引用MS Word对象库.

Function ParseRTF(strRTF As String) As String
Dim wdDoc As Word.Document 'Ref: Microsoft Word 11.0 Object Library'
Dim f     As Integer       'Variable to store the file I/O number'

'File path for a temporary .rtf file'
Const strFileTemp = "C:\TempFile_ParseRTF.rtf"

'Obtain the next valid file I/O number'
f = FreeFile

'Open the temp file and save the RTF string in it'
Open strFileTemp For Output As #f
    Print #f, strRTF
Close #f

'Open the .rtf file as a Word.Document'
Set wdDoc = GetObject(strFileTemp)

'Read the now parsed text from the Word.Document'
ParseRTF = wdDoc.Range.Text

'Delete the temporary .rtf file'
Kill strFileTemp

'Close the Word connection'
wdDoc.Close False
Set wdDoc = Nothing
End Function
Run Code Online (Sandbox Code Playgroud)

您可以使用类似于此的内容为您的12,000个单元格中的每一个调用它:

Sub ParseAllRange()
Dim rngCell As Range
Dim strRTF  As String

For Each rngCell In Range("A1:A12000")

    'Parse the cell contents'
    strRTF = ParseRTF(CStr(rngCell))

    'Output to the cell one column over'
    rngCell.Offset(0, 1) = strRTF
Next
End Sub
Run Code Online (Sandbox Code Playgroud)

ParseRTF功能大约需要一秒钟的时间(至少在我的机器上),因此对于12,000个单元,这将在大约三个半小时后运行.


在周末考虑过这个问题后,我确信有一个更好(更快)的解决方案.

我记得剪贴板的RTF功能,并意识到可以创建一个类,将RTF数据复制到剪贴板,粘贴到word文档,并输出生成的纯文本.这个解决方案的好处是不必为每个rtf字符串打开和关闭单词doc对象; 它可以在循环之前打开并在之后关闭.

以下是实现此目的的代码.它是一个名为clsRTFParser的Class模块.

Private Declare Function GlobalAlloc Lib "kernel32" _
                (ByVal wFlags&, ByVal dwBytes As Long) As Long
Private Declare Function GlobalLock Lib "kernel32" _
                (ByVal hMem As Long) As Long
Private Declare Function GlobalUnlock Lib "kernel32" _
                (ByVal hMem As Long) As Long
Private Declare Function lstrcpy Lib "kernel32" _
                (ByVal lpString1 As Any, ByVal lpString2 As Any) As Long

Private Declare Function OpenClipboard Lib "user32" _
                (ByVal Hwnd As Long) As Long
Private Declare Function EmptyClipboard Lib "user32" () As Long
Private Declare Function RegisterClipboardFormat Lib "user32" Alias _
                "RegisterClipboardFormatA" (ByVal lpString As String) As Long
Private Declare Function SetClipboardData Lib "user32" _
                (ByVal wFormat As Long, ByVal hMem As Long) As Long
Private Declare Function CloseClipboard Lib "user32" () As Long

'---'

Dim wdDoc As Word.Document 'Ref: Microsoft Word 11.0 Object Library'

Private Sub Class_Initialize()
Set wdDoc = New Word.Document
End Sub

Private Sub Class_Terminate()
wdDoc.Close False
Set wdDoc = Nothing
End Sub

'---'

Private Function CopyRTF(strCopyString As String) As Boolean
Dim hGlobalMemory  As Long
Dim lpGlobalMemory As Long
Dim hClipMemory    As Long
Dim lngFormatRTF   As Long

'Allocate and copy string to memory'
hGlobalMemory = GlobalAlloc(&H42, Len(strCopyString) + 1)
lpGlobalMemory = GlobalLock(hGlobalMemory)
lpGlobalMemory = lstrcpy(lpGlobalMemory, strCopyString)

'Unlock the memory and then copy to the clipboard'
If GlobalUnlock(hGlobalMemory) = 0 Then
    If OpenClipboard(0&) <> 0 Then
        Call EmptyClipboard

        'Save the data as Rich Text Format'
        lngFormatRTF = RegisterClipboardFormat("Rich Text Format")
        hClipMemory = SetClipboardData(lngFormatRTF, hGlobalMemory)

        CopyRTF = CBool(CloseClipboard)
    End If
End If
End Function

'---'

Private Function PasteRTF() As String
Dim strOutput As String

'Paste the clipboard data to the wdDoc and read the plain text result'
wdDoc.Range.Paste
strOutput = wdDoc.Range.Text

'Get rid of the new lines at the beginning and end of the document'
strOutput = Left(strOutput, Len(strOutput) - 2)
strOutput = Right(strOutput, Len(strOutput) - 2)

PasteRTF = strOutput
End Function

'---'

Public Function ParseRTF(strRTF As String) As String
If CopyRTF(strRTF) Then
    ParseRTF = PasteRTF
Else
    ParseRTF = "Error in copying to clipboard"
End If
End Function
Run Code Online (Sandbox Code Playgroud)

您可以使用类似于此的内容为您的12,000个单元格中的每一个调用它:

Sub CopyParseAllRange()
Dim rngCell As Range
Dim strRTF  As String

'Create new instance of clsRTFParser'
Dim RTFParser As clsRTFParser
Set RTFParser = New clsRTFParser

For Each rngCell In Range("A1:A12000")

    'Parse the cell contents'
    strRTF = RTFParser.ParseRTF(CStr(rngCell))

    'Output to the cell one column over'
    rngCell.Offset(0, 1) = strRTF
Next
End Sub
Run Code Online (Sandbox Code Playgroud)

我在我的机器上使用示例RTF字符串模拟了这个.对于12,000个电池,花费了两分半钟,这是一个更合理的时间框架!