Scrape Instagram Web Hashtag帖子

Question

Scrape Instagram Web Hashtag帖子

Ela*_*son 3 xpath google-sheets web-scraping google-apps-script instagram

我试图将帖子的数量刮到给定的标签（#castles），并使用ImportXML填充Google表格单元格。

我尝试从Chrome复制Xpath并将其粘贴到像这样的单元格中的ImportXML参数中：

=ImportXML("https://www.instagram.com/explore/tags/castels/", "//*[@id="react-root"]/section/main/header/div[2]/div/div[2]/span/span")

Run Code Online (Sandbox Code Playgroud)

我看到引号存在问题，因此我也尝试了：

=ImportXML("https://www.instagram.com/explore/tags/castels/", "//*[@id='react-root']/section/main/header/div[2]/div/div[2]/span/span")

Run Code Online (Sandbox Code Playgroud)

但是，两者都返回错误。

我究竟做错了什么？

附言：我知道元标记描述的Xpath，"//meta[@name='description']/@content"但是我想抓取帖子的确切数目，而不是缩写的数目。

Answer 1

Sou*_*ria 5

尝试这个 -

function hashCount() {
  var url = 'instagram.com/explore/tags/cats/';
  var response = UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText();
  var regex = /(edge_hashtag_to_media":{"count":)(\d+)(,"page_info":)/gm;
  var count = regex.exec(response)[2];
  Logger.log(count);
}

Run Code Online (Sandbox Code Playgroud)

演示 -

我muteHttpExceptions: true添加了上面我的评论中未添加的内容。希望这可以帮助。

归档时间：	6 年，5 月前
查看次数：	83 次
最近记录：	6 年，4 月前