如何以编程方式迭代Word文档中的下标,上标和方程式

cla*_*aws 4 vba ms-word word-vba

我有一些Word文档,每个文档包含几百页的科学数据,其中包括:

  • 化学式(H2SO4含有所有适当的下标和上标)
  • 科学数字(使用上标格式化的指数)
  • 大量的数学方程.用Word中的数学方程编辑器编写.

问题是,以Word的形式存储这些数据对我们来说效率不高.所以我们希望将所有这些信息存储在数据库(MySQL)中.我们想将这些格式转换为LaTex.

有没有办法使用VBA迭代所有的子脚本和上标和方程?

如何迭代数学方程式?

Cyl*_*ian 10

根据您对迈克尔答案的评论

没有!我只想用_ {subscriptcontent}替换下标中的内容,用^ {superscriptcontent}替换上标内容.这将是Tex的等价物.现在,我只是将所有内容复制到一个文本文件中,该文件将删除格式但保留这些字符.问题解决了.但为此我需要访问文档的下标和上标对象

Sub sampler()
    Selection.HomeKey wdStory
    With Selection.find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Font.Superscript = True
        .Replacement.Text = "^^{^&}"
        .Execute Replace:=wdReplaceAll
        .Font.Subscript = True
        .Replacement.Text = "_{^&}"
        .Execute Replace:=wdReplaceAll
    End With
End Sub
Run Code Online (Sandbox Code Playgroud)

编辑

或者如果您还想转换OMathsTeX / LaTeX,请执行以下操作:

  • 迭代Omaths>将每个转换为MathML> [将MathML保存到磁盘] + [在描述MathML文件引用的文档中放置一些标记代替OMath]>将Word文件转换为文本
  • 现在准备像MathParser这样的转换器并将MathML文件转换为LateX.
  • 解析文本文件>搜索并相应地替换LaTeX代码.

有关完全不同的想法,请访问David Carlisle的博客,您可能会感兴趣.

UPDATE

The module

Option Explicit

'This module requires the following references:
'Microsoft Scripting Runtime
'MicroSoft XML, v6.0

Private fso As New Scripting.FileSystemObject
Private omml2mml$, mml2Tex$

Public Function ProcessFile(fpath$) As Boolean
    'convPath set to my system at (may vary on your system):
    omml2mml = "c:\program files\microsoft office\office14\omml2mml.xsl"
    'download: http://prdownloads.sourceforge.net/xsltml/xsltml_2.0.zip
    'unzip at «c:\xsltml_2.0»
    mml2Tex = "c:\xsltml_2.0\mmltex.xsl"

    Documents.Open fpath

    'Superscript + Subscript
    Selection.HomeKey wdStory
    With Selection.find
        .ClearFormatting
        .Replacement.ClearFormatting

        'to make sure no paragraph should contain any emphasis
        .Text = "^p"
        .Replacement.Text = "^&"
        .Replacement.Font.Italic = False
        .Replacement.Font.Bold = False
        .Replacement.Font.Superscript = False
        .Replacement.Font.Subscript = False
        .Replacement.Font.SmallCaps = False
        .Execute Replace:=wdReplaceAll


        .Font.Italic = True
        .Replacement.Text = "\textit{^&}"
        .Execute Replace:=wdReplaceAll

        .Font.Bold = True
        .Replacement.Text = "\textbf{^&}"
        .Execute Replace:=wdReplaceAll

        .Font.SmallCaps = True
        .Replacement.Text = "\textsc{^&}"
        .Execute Replace:=wdReplaceAll


        .Font.Superscript = True
        .Replacement.Text = "^^{^&}"
        .Execute Replace:=wdReplaceAll


        .Font.Subscript = True
        .Replacement.Text = "_{^&}"
        .Execute Replace:=wdReplaceAll
    End With

    Dim dict As New Scripting.Dictionary
    Dim om As OMath, t, counter&, key$
    key = Replace(LCase(Dir(fpath)), " ", "_omath_")
    counter = 0

    For Each om In ActiveDocument.OMaths
        DoEvents
        counter = counter + 1
        Dim tKey$, texCode$
        tKey = "<" & key & "_" & counter & ">"
        t = om.Range.WordOpenXML

        texCode = TransformString(TransformString(CStr(t), omml2mml), mml2Tex)
        om.Range.Select
        Selection.Delete
        Selection.Text = tKey

        dict.Add tKey, texCode

    Next om

    Dim latexDoc$, oPath$
    latexDoc = "\documentclass[10pt]{article}" & vbCrLf & _
                "\usepackage[utf8]{inputenc} % set input encoding" & vbCrLf & _
                "\usepackage{amsmath,amssymb}" & vbCrLf & _
                "\begin{document}" & vbCrLf & _
                "###" & vbCrLf & _
                "\end{document}"

    oPath = StrReverse(Mid(StrReverse(fpath), InStr(StrReverse(fpath), "."))) & "tex"
    'ActiveDocument.SaveAs FileName:=oPath, FileFormat:=wdFormatText, Encoding:=1200
    'ActiveDocument.SaveAs FileName:=oPath, FileFormat:=wdFormatText, Encoding:=65001
    ActiveDocument.Close

    Dim c$, i
    c = fso.OpenTextFile(oPath).ReadAll()

    counter = 0

    For Each i In dict
        counter = counter + 1
        Dim findText$, replaceWith$
        findText = CStr(i)
        replaceWith = dict.item(i)
        c = Replace(c, findText, replaceWith, 1, 1, vbTextCompare)
    Next i

    latexDoc = Replace(latexDoc, "###", c)

    Dim ost As TextStream
    Set ost = fso.CreateTextFile(oPath)
    ost.Write latexDoc

    ProcessFile = True


End Function

Private Function CreateDOM()
    Dim dom As New DOMDocument60
    With dom
        .async = False
        .validateOnParse = False
        .resolveExternals = False
    End With
    Set CreateDOM = dom
End Function

Private Function TransformString(xmlString$, xslPath$) As String
    Dim xml, xsl, out
    Set xml = CreateDOM
    xml.LoadXML xmlString
    Set xsl = CreateDOM
    xsl.Load xslPath
    out = xml.transformNode(xsl)
    TransformString = out
End Function
Run Code Online (Sandbox Code Playgroud)

The calling(from immediate window):

?ProcessFile("c:\test.doc")
Run Code Online (Sandbox Code Playgroud)

其结果将作为被创建test.texc:\.


该模块可能需要修复一些地方.如果是这样,请告诉我.