相关疑难解决方法(0)

使用 R 抓取网站的 Power BI 仪表板

我一直在尝试使用 R 抓取我当地政府的 Power BI 仪表板，但似乎不可能。我从 Microsoft 网站上了解到，无法对 Power BI 仪表板进行 scrable，但我正在通过几个论坛表明这是可能的，但是我正在经历一个循环

我正在尝试Zip Code从此仪表板中抓取选项卡数据：

https://app.powerbigov.us/view?r=eyJrIjoiZDFmN2ViMGEtNzQzMC00ZDU3LTkwZjUtOWU1N2RiZmJlOTYyIiwidCI6IjNiMTg1MTYzLTZjYTMtNDA2NS04NDAwLWNhNzJiM2Y3OWU2ZCJ9&pageName=ReportSectionb438b98829599a9276e2&pageName=ReportSectionb438b98829599a9276e2

我从下面给定的代码中尝试了几种“技术”

scc_webpage <- xml2::read_html("https://app.powerbigov.us/view?r=eyJrIjoiZDFmN2ViMGEtNzQzMC00ZDU3LTkwZjUtOWU1N2RiZmJlOTYyIiwidCI6IjNiMTg1MTYzLTZjYTMtNDA2NS04NDAwLWNhNzJiM2Y3OWU2ZCJ9&pageName=ReportSectionb438b98829599a9276e2&pageName=ReportSectionb438b98829599a9276e2")


# Attempt using xpath
scc_webpage %>% 
  rvest::html_nodes(xpath = '//*[@id="pvExplorationHost"]/div/div/exploration/div/explore-canvas-modern/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container-group/transform/div/div[2]/visual-container-modern[1]/transform/div/div[3]/div/visual-modern/div/div/div[2]/div[1]/div[4]/div/div/div[1]/div[1]') %>% 
  rvest::html_text()

# Attempt using div.<class>
scc_webpage %>% 
  rvest::html_nodes("div.pivotTableCellWrap cell-interactive tablixAlignRight ") %>% 
  rvest::html_text()

# Attempt using xpathSapply
query = '//*[@id="pvExplorationHost"]/div/div/exploration/div/explore-canvas-modern/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container-group/transform/div/div[2]/visual-container-modern[1]/transform/div/div[3]/div/visual-modern/div/div/div[2]/div[1]/div[4]/div/div/div[1]/div[1]'
XML::xpathSApply(xml, query, xmlValue)

scc_webpage %>% 
  html_nodes("ui-view")

Run Code Online (Sandbox Code Playgroud)

但是我总是character(0)在使用 xpath 并获取div类和 id时得到一个输出，或者甚至{xml_nodeset (0)}在尝试通过html_nodes. 奇怪的是，当我这样做时，它不会显示画面数据的整个 html：