Ras*_*sen 29 html excel vba r excel-vba
将HTML表转换为Excel
下面的代码在https://rasmusrhl.github.io/stuff上获取HTML表,并将其转换为Excel格式.
问题是:
解
谢谢大家的贡献.各种各样的anwers帮助我理解,为了我的目的,解决方法是最好的解决方案:因为我自己生成HTML表,我可以控制每个单元格的CSS.存在CSS代码,指示Excel如何解释单元格内容:http://cosicimiento.blogspot.dk/2008/11/styling-excel-cells-with-mso-number.html,也在此问题中解释:格式化HTML表格单元格以便Excel格式化为文本?
在我的情况下,CSS应该是文本,即mso-number-format:\"\\@\"
.它集成在下面的R代码中:
library(htmlTable)
library(nycflights13)
library(dplyr)
nycflights13::planes %>%
slice(1:10) %>% mutate( seats = seats*1.0001,
s1 = c("1-5", "5-10", "1/2", "1/10", "2-3", "1", "1.0", "01", "01.00", "asfdkjlæ" ),
s2 = c("(10)", "(12)", "(234)", "(00)", "(01)", "(098)", "(01)", "(01.)", "(001.0)", "()" )) -> df
rle_man <- rle(df$manufacturer)
css_matrix <- matrix( data = "mso-number-format:\"\\@\"", nrow = nrow(df), ncol = ncol(df))
css_matrix[,1] <- "padding-left: 0.4cm;mso-number-format:\"\\@\""
css_matrix[,2:10] <- "padding-left: 1cm;mso-number-format:\"\\@\""
css_matrix[,5] <- "padding-left: 2cm;mso-number-format:\"\\@\""
htmlTable( x = df,
rgroup = rle_man$values, n.rgroup = rle_man$lengths,
rnames = FALSE, align = c("l", "r" ),
cgroup = rbind( c("", "Some text goes here. It is long and does not break", "Other text goes here", NA),
c( "", "Machine type<br>(make)", "Specification of machine", "Other variables")),
n.cgroup = rbind( c(1,8,2, NA),
c(1, 3, 5, 2)),
css.cell = css_matrix ) -> html_out
temp_file <- tempfile( pattern = "table", fileext = ".html" )
readr::write_file( x = html_out, path = temp_file)
utils::browseURL( temp_file)
Run Code Online (Sandbox Code Playgroud)
可以将HTML文件拖放到Excel中,并将所有单元格解释为文本.请注意,只将html文件拖放到excel中,才能在浏览器中打开表并将其复制粘贴到excel中.
这种方法唯一缺少的是水平线,但我可以忍受.
下面是VBA,与拖放效果相同:
Sub importhtml()
'
' importhtml Macro
'
'
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;file:///C:/Users/INSERTUSERNAME/Desktop/table18b85c0a20f3html.HTML", Destination:=Range("$a$1"))
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = False
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingAll
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = True
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub
Run Code Online (Sandbox Code Playgroud)
对于客户端解决方案
因此,在第一个代码块之后运行此代码,它会重写最后两列.
Sub Test2()
'* tools references ->
'* Microsoft HTML Object Library
Dim oHtml4 As MSHTML.IHTMLDocument4
Set oHtml4 = New MSHTML.HTMLDocument
Dim oHtml As MSHTML.HTMLDocument
Set oHtml = Nothing
'* IHTMLDocument4.createDocumentFromUrl
'* MSDN - IHTMLDocument4 createDocumentFromUrl method - https://msdn.microsoft.com/en-us/library/aa752523(v=vs.85).aspx
Set oHtml = oHtml4.createDocumentFromUrl("https://rasmusrhl.github.io/stuff/", "")
While oHtml.readyState <> "complete"
DoEvents '* do not comment this out it is required to break into the code if in infinite loop
Wend
Debug.Assert oHtml.readyState = "complete"
Dim oTRs As MSHTML.IHTMLDOMChildrenCollection
Set oTRs = oHtml.querySelectorAll("TR")
Debug.Assert oTRs.Length = 17
Dim lRowNum As Long
For lRowNum = 3 To oTRs.Length - 1
Dim oTRLoop As MSHTML.HTMLTableRow
Set oTRLoop = oTRs.Item(lRowNum)
If oTRLoop.ChildNodes.Length > 1 Then
Debug.Assert oTRLoop.ChildNodes.Length = 14
Dim oSecondToLastColumn As MSHTML.HTMLTableCell
Set oSecondToLastColumn = oTRLoop.ChildNodes.Item(12)
ActiveSheet.Cells(lRowNum + 2, 13).Value2 = "'" & oSecondToLastColumn.innerText
Dim oLastColumn As MSHTML.HTMLTableCell
Set oLastColumn = oTRLoop.ChildNodes.Item(13)
ActiveSheet.Cells(lRowNum + 2, 14).Value2 = "'" & oLastColumn.innerText
End If
'Stop
Next lRowNum
ActiveSheet.Columns("M:M").EntireColumn.AutoFit
ActiveSheet.Columns("N:N").EntireColumn.AutoFit
End Sub
Run Code Online (Sandbox Code Playgroud)
对于服务器端解决方案
现在我们知道您控制源脚本并且它在R中,然后可以更改R脚本以使用mso-number-format设置最终列的样式:'\ @'.下面是一个实现此目的的示例R脚本,一个构建与数据尺寸相同的CSS矩阵,并将CSS矩阵作为参数传递给htmlTable
.我没有篡改你的R源代替我在这里给你一个简单的插图供你解释.
A=matrix(c("(2)","(4)","(3)","(1)","(5)","(7)"),nrow=2,ncol=3,byrow=TRUE)
css_matrix <- matrix(data="",nrow=2,ncol=3)
css_matrix[,3] <- "mso-number-format:\"\\@\""
htmlTable(x=A,css.cell=css_matrix)
Run Code Online (Sandbox Code Playgroud)
罗宾麦肯齐补充道
您可能会在服务器端解决方案中提到OP只需要将css_matrix [,10:11] < - "mso-number-format:\"\ @ \""添加到他们现有的R代码中(在最后一个css_matrix之后) .)它将针对他们的具体问题实施您的解决方案
谢谢罗宾
要从该页面获取表格数据(保持格式不变),您可以尝试如下:
Sub Fetch_Data()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim posts As Object, post As Object, elem As Object
Dim row As Long, col As Long
With http
.Open "GET", "https://rasmusrhl.github.io/stuff/", False
.send
html.body.innerHTML = .responseText
End With
Set posts = html.getElementsByClassName("gmisc_table")(0)
For Each post In posts.Rows
For Each elem In post.Cells
col = col + 1: Cells(row + 1, col).NumberFormat = "@": Cells(row + 1, col) = elem.innerText
Next elem
col = 0
row = row + 1
Next post
End Sub
Run Code Online (Sandbox Code Playgroud)
参考添加到库:
1. Microsoft HTML Object Library
2. Microsoft XML, v6.0 'or whatever version you have
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
5547 次 |
最近记录: |