如何使用 BeautifulSoup 抓取 YouTube 评论

Abh*_*ash 3 python beautifulsoup web-scraping python-3.x

我是新手。我想知道如何使用 BeautifulSoup 来抓取 YouTube 评论。看到这里我就震惊了 任何人都可以帮我编写代码吗?

这是我写的:

import requests    
from bs4 import BeautifulSoup

r = requests.get("https://www.youtube.com/watch?v=kffacxfA7G4"    
req =r.conten    
soup = BeautifulSoup(req,'html.parser')    
print(soup.prettify())    
all = soup.find_all('div',{'id' : 'contents'})
Run Code Online (Sandbox Code Playgroud)

我被困在这里没有得到任何输出,检查它显示评论的 wb 页面有 id = 内容

SIM*_*SIM 5

该网站的评论是动态生成的。您无法通过使用库的主链接来获取requests它们BeautifulSoup。要获取跟踪上述链接的内容,您需要使用任何浏览器模拟器,例如selenium. 作为初学者,您可以尝试如下所示。以下脚本将为您获取解开的评论。顺便说一句,该网站还启用了延迟加载方法,因此您需要抽动for loop才能获取更多内容。

\n\n
import time\nfrom selenium.webdriver import Chrome\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.common.keys import Keys\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions as EC\n\nwith Chrome() as driver:\n    wait = WebDriverWait(driver,10)\n    driver.get("https://www.youtube.com/watch?v=kffacxfA7G4")\n\n    for item in range(3): #by increasing the highest range you can get more content\n        wait.until(EC.visibility_of_element_located((By.TAG_NAME, "body"))).send_keys(Keys.END)\n        time.sleep(3)\n\n    for comment in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#comment #content-text"))):\n        print(comment.text)\n
Run Code Online (Sandbox Code Playgroud)\n\n

部分输出:

\n\n
15 April 2018 ?\xc2\xbf?\nApril 2018??\n8 years people \nNice songs Justin Bieber https://youtu.be/OvfAc7JGoc4\n2018 hit like...\xe2\x99\xa5\xef\xb8\x8f\xe2\x99\xa5\xef\xb8\x8f\xe2\x99\xa5\xef\xb8\x8f\xe2\x99\xa5\xef\xb8\x8f\n8 years complete \nCan likes beat dislikes??\nView 1, 8 billion great song\n
Run Code Online (Sandbox Code Playgroud)\n