如何使用 YouTube API v3 和 Python 从视频中获取评论?

msh*_*ruz 6 python youtube youtube-data-api

我一直在尝试使用 Python 从 YouTube 上的给定视频中获取评论(线程和回复)(作为学习语言的练习)。

根据官方网站(https://developers.google.com/youtube/v3/docs/commentThreads/list)上给出的示例,我能够得到一些评论,但不是全部。我尝试添加一些代码来处理多个页面,但是我无法获得只有一个页面的视频的评论。

例如,https://www.youtube.com/watch?v=Gd_L7DVKTA8有 17 条评论(包括回复),但我只能获得 7 个线程和 2 个回复。有趣的是,我使用上面链接中提供的 API Explorer 得到了相同的结果(只有 7 个线程)。

我的代码如下:

#!/usr/bin/python

# Usage:
# python scraper.py --videoid='<video_id>'

from apiclient.errors import HttpError
from oauth2client.tools import argparser
from apiclient.discovery import build

YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
DEVELOPER_KEY = 'key'


def get_comment_threads(youtube, video_id, comments):
   threads = []
   results = youtube.commentThreads().list(
     part="snippet",
     videoId=video_id,
     textFormat="plainText",
   ).execute()

  #Get the first set of comments
  for item in results["items"]:
    threads.append(item)
    comment = item["snippet"]["topLevelComment"]
    text = comment["snippet"]["textDisplay"]
    comments.append(text)

  #Keep getting comments from the following pages
  while ("nextPageToken" in results):
    results = youtube.commentThreads().list(
      part="snippet",
      videoId=video_id,
      pageToken=results["nextPageToken"],
      textFormat="plainText",
    ).execute()
    for item in results["items"]:
      threads.append(item)
      comment = item["snippet"]["topLevelComment"]
      text = comment["snippet"]["textDisplay"]
      comments.append(text)

  print "Total threads: %d" % len(threads)

  return threads


def get_comments(youtube, parent_id, comments):
  results = youtube.comments().list(
    part="snippet",
    parentId=parent_id,
    textFormat="plainText"
  ).execute()

  for item in results["items"]:
    text = item["snippet"]["textDisplay"]
    comments.append(text)

  return results["items"]

if __name__ == "__main__":
  argparser.add_argument("--videoid", help="Required; ID for video for which the comment will be inserted.")
  args = argparser.parse_args()
  youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)

  try:
    output_file = open("output.txt", "w")
    comments = []
    video_comment_threads = get_comment_threads(youtube, args.videoid, comments)

    for thread in video_comment_threads:
      get_comments(youtube, thread["id"], comments)

    for comment in comments:
      output_file.write(comment.encode("utf-8") + "\n")

    output_file.close()
    print "Total comments: %d" % len(comments)

  except HttpError, e:
    print "An HTTP error %d occurred:\n%s" % (e.resp.status, e.content)
Run Code Online (Sandbox Code Playgroud)

在此先感谢您的任何建议!

Dar*_*ius 1

看来你正在处理我遇到的同样的问题。您最可能丢失的评论隐藏在评论主题后面。简单的解决方案,在获取所有评论线程 id 后,获取每个评论线程 id 并检查它是否有隐藏评论,如果有则刮掉它们。这是一个简单的例子:

if (item['snippet']['totalReplyCount']>0):
            res2 = comments_list(youtube, 'snippet', item['id'])
            for item2 in res2['items']:
                commentL = list()
                commentL.append(item2['id'])
                commentL.append(item2['snippet']['authorChannelUrl'])

def comments_list(service, part, parent_id):
    results = service.comments().list(
    parentId=parent_id,
    part=part
  ).execute()

    return results 
Run Code Online (Sandbox Code Playgroud)