msh*_*ruz 6 python youtube youtube-data-api
我一直在尝试使用 Python 从 YouTube 上的给定视频中获取评论(线程和回复)(作为学习语言的练习)。
根据官方网站(https://developers.google.com/youtube/v3/docs/commentThreads/list)上给出的示例,我能够得到一些评论,但不是全部。我尝试添加一些代码来处理多个页面,但是我无法获得只有一个页面的视频的评论。
例如,https://www.youtube.com/watch?v=Gd_L7DVKTA8有 17 条评论(包括回复),但我只能获得 7 个线程和 2 个回复。有趣的是,我使用上面链接中提供的 API Explorer 得到了相同的结果(只有 7 个线程)。
我的代码如下:
#!/usr/bin/python
# Usage:
# python scraper.py --videoid='<video_id>'
from apiclient.errors import HttpError
from oauth2client.tools import argparser
from apiclient.discovery import build
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
DEVELOPER_KEY = 'key'
def get_comment_threads(youtube, video_id, comments):
threads = []
results = youtube.commentThreads().list(
part="snippet",
videoId=video_id,
textFormat="plainText",
).execute()
#Get the first set of comments
for item in results["items"]:
threads.append(item)
comment = item["snippet"]["topLevelComment"]
text = comment["snippet"]["textDisplay"]
comments.append(text)
#Keep getting comments from the following pages
while ("nextPageToken" in results):
results = youtube.commentThreads().list(
part="snippet",
videoId=video_id,
pageToken=results["nextPageToken"],
textFormat="plainText",
).execute()
for item in results["items"]:
threads.append(item)
comment = item["snippet"]["topLevelComment"]
text = comment["snippet"]["textDisplay"]
comments.append(text)
print "Total threads: %d" % len(threads)
return threads
def get_comments(youtube, parent_id, comments):
results = youtube.comments().list(
part="snippet",
parentId=parent_id,
textFormat="plainText"
).execute()
for item in results["items"]:
text = item["snippet"]["textDisplay"]
comments.append(text)
return results["items"]
if __name__ == "__main__":
argparser.add_argument("--videoid", help="Required; ID for video for which the comment will be inserted.")
args = argparser.parse_args()
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)
try:
output_file = open("output.txt", "w")
comments = []
video_comment_threads = get_comment_threads(youtube, args.videoid, comments)
for thread in video_comment_threads:
get_comments(youtube, thread["id"], comments)
for comment in comments:
output_file.write(comment.encode("utf-8") + "\n")
output_file.close()
print "Total comments: %d" % len(comments)
except HttpError, e:
print "An HTTP error %d occurred:\n%s" % (e.resp.status, e.content)
Run Code Online (Sandbox Code Playgroud)
在此先感谢您的任何建议!
看来你正在处理我遇到的同样的问题。您最可能丢失的评论隐藏在评论主题后面。简单的解决方案,在获取所有评论线程 id 后,获取每个评论线程 id 并检查它是否有隐藏评论,如果有则刮掉它们。这是一个简单的例子:
if (item['snippet']['totalReplyCount']>0):
res2 = comments_list(youtube, 'snippet', item['id'])
for item2 in res2['items']:
commentL = list()
commentL.append(item2['id'])
commentL.append(item2['snippet']['authorChannelUrl'])
def comments_list(service, part, parent_id):
results = service.comments().list(
parentId=parent_id,
part=part
).execute()
return results
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
17689 次 |
最近记录: |