使用 VBA 从谷歌翻译中提取 div 的内容

Waz*_*Teo 3 html regex excel vba

我有一个翻译语言的功能:

Public Function Translate(rng As Range, Optional translateFrom As String = "nl", Optional translateTo As String = "en")
    Dim getParam As String, Trans As String, objHTTP As Object, URL As String
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    getParam = ConvertToGet(rng.Value)
    URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
    objHTTP.Open "GET", URL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ("")
    If InStr(objHTTP.responseText, "div dir=""ltr""") > 0 Then
        Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")
        Translate = CleanA(Trans)
    Else
        Translate = CVErr(xlErrValue)
    End If
End Function
Run Code Online (Sandbox Code Playgroud)

有中文翻译:

A1 = Hello

B1 = Translate(A1,"en","zh-cn")
Run Code Online (Sandbox Code Playgroud)

结果是“N?h?o”,正确的结果是“??”

链接谷歌:https : //translate.google.com/m?hl= en & sl = en & tl = zh-CN & ie =UTF-8& prev = _m & q =hello

我想要结果:

B1 = ??

C1 = N? h?o
Run Code Online (Sandbox Code Playgroud)

我想我需要修复这段代码:

Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")
Run Code Online (Sandbox Code Playgroud)

请帮帮我,谢谢!

Rob*_*zie 5

该页面返回两个<div>像:

<div dir="ltr" class="o1">N? h?o</div>
<div dir="ltr" class="t0">??</div>
Run Code Online (Sandbox Code Playgroud)

建议您不要尝试使用正则表达式解析 HTML,因为解析 HTML 很容易遇到困难 - 您可以在 VBA 中使用 Microsoft HTML 对象库来获得类似的结果。

要获取这两个<div>标签的内容,您可以按照以下示例使用此代码:

' o1 has Anglicised translation, t0 as tranlsation in target language
Set objDivs = objHTML.getElementsByTagName("div")
For Each objDiv In objDivs
    If objDiv.className = "o1" Then
        strTranslatedO1 = objDiv.innerText
    End If
    If objDiv.className = "t0" Then
        strTranslatedT0 = objDiv.innerText
    End If
Next objDiv
Run Code Online (Sandbox Code Playgroud)

这是通过所有的基本循环<div>在返回的HTML标签和检查的类名称o1t0,然后获得innerText产权。使用这种技术,您可以获得转换后的值并将它们写回工作表,例如:

在此处输入图片说明

完整代码:

Option Explicit

Public Sub Test()

    Dim ws As Worksheet
    
    ' testing worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    ws.Cells.Delete
    
    ' test inputs
    ws.Range("A1:E1") = Array("Input", "From", "To", "T0", "O1")
    ws.Range("A2:A4") = "hello"
    ws.Range("B2:B4") = "English"
    ws.Range("C2:C4") = Application.Transpose(Array("Chinese", "Spanish", "Russian"))
    
    ' test
    ws.Range("D2") = Translate("hello", "en", "zh-cn", True)
    ws.Range("E2") = Translate("hello", "en", "zh-cn", False)
    ws.Range("D3") = Translate("hello", "en", "es", True)
    ws.Range("E3") = Translate("hello", "en", "es", False) 'Spanish uses latin alphabet
    ws.Range("D4") = Translate("hello", "en", "ru", True)
    ws.Range("E4") = Translate("hello", "en", "ru", False)

End Sub

Public Function Translate(strInput As String, strFromLanguageCode As String, strToLanguageCode As String, blnTargetAlphabet As Boolean) As String
    
    Dim strURL As String
    Dim objHTTP As Object
    Dim objHTML As Object
    Dim objDivs As Object, objDiv
    Dim strTranslatedT0 As String
    Dim strTranslatedO1 As String
   
    ' send query to web page
    strURL = "https://translate.google.com/m?hl=" & strFromLanguageCode & _
        "&sl=" & strFromLanguageCode & _
        "&tl=" & strToLanguageCode & _
        "&ie=UTF-8&prev=_m&q=" & strInput

    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    objHTTP.Open "GET", strURL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ""
    
    ' create a html document
    Set objHTML = CreateObject("htmlfile")
    With objHTML
        .Open
        .Write objHTTP.responseText
        .Close
    End With
    
    ' o1 has Anglicised translation, t0 as tranlsation in target language
    Set objDivs = objHTML.getElementsByTagName("div")
    For Each objDiv In objDivs
        If objDiv.className = "o1" Then
            strTranslatedO1 = objDiv.innerText
        End If
        If objDiv.className = "t0" Then
            strTranslatedT0 = objDiv.innerText
        End If
    Next objDiv
    
    ' choose which to return
    If blnTargetAlphabet Then
        Translate = strTranslatedT0
    Else
        Translate = strTranslatedO1
    End If
    
CleanUp:
    Set objHTML = Nothing
    Set objHTTP = Nothing
    
End Function
Run Code Online (Sandbox Code Playgroud)

2020 年 12 月更新

看起来这种方法将不再适用于 11 月中旬。

看着回复

  • div类名已经改变的东西比较模糊
  • 有一些深奥的c-wiz元素在做一些美妙的事情......
  • 此外,我怀疑某些客户端脚本在检索文档后要求进行实际翻译

选项:Selenium、Microsoft Translate、Google 翻译 API 的免费和付费层级;)