使用 python 从公共 Google Drive 下载文件：范围问题？

Question

使用 python 从公共 Google Drive 下载文件：范围问题？

jea*_*ean 6 python google-drive-api google-oauth pydrive google-developers-console

使用我对如何从公共 Google 驱动器下载文件的问题的回答，我过去设法使用 Python 脚本中的 ID 下载图像，并使用以下代码块从公共驱动器中下载 Google API v3：

from google_auth_oauthlib.flow import Flow, InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload, MediaIoBaseDownload
from google.auth.transport.requests import Request
import io
import re
SCOPES = ['https://www.googleapis.com/auth/drive']
CLIENT_SECRET_FILE = "myjson.json"
authorized_port = 6006 # authorize URI redirect on the console
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server(port=authorized_port)
drive_service = build("drive", "v3", credentials=cred)
regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
for i, l in enumerate(links_to_download):
    url = l
    file_id = re.search(regex, url)[0]
    request = drive_service.files().get_media(fileId=file_id)
    fh = io.FileIO(f"file_{i}", mode='wb')
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()
        print("Download %d%%." % int(status.progress() * 100))

Run Code Online (Sandbox Code Playgroud)

与此同时，我发现了pydrive和pydrive2，这是 Google API v2 的两个包装器，它们允许执行非常有用的操作，例如列出文件夹中的文件，并且基本上允许使用更简单的语法执行相同的操作：

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import io
import re
CLIENT_SECRET_FILE = "client_secrets.json"

gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
for i, l in enumerate(links_to_download):
    url = l
    file_id = re.search(regex, url)[0]
    file_handle = drive.CreateFile({'id': file_id})
    file_handle.GetContentFile(f"file_{i}")

Run Code Online (Sandbox Code Playgroud)

然而现在，无论我使用 pydrive 还是原始 API，我似乎都无法下载相同的文件，而是遇到：

googleapiclient.errors.HttpError: <HttpError 404 when requesting https://www.googleapis.com/drive/v3/files/fileID?alt=media returned "File not found: fileID.". Details: "[{'domain': 'global', 'reason': 'notFound', 'message': 'File not found: fileID.', 'locationType': 'parameter', 'location': 'fileId'}]">

Run Code Online (Sandbox Code Playgroud)

我尝试了所有方法并使用 Google 控制台注册了 3 个不同的应用程序，这似乎可能是（或不是）范围界定问题（例如，请参阅此答案，应用程序只能访问我的 Google 驱动器中的文件或由此应用程序创建的文件）。不过我之前（去年）没有遇到过这个问题。

当访问Google 控制台时，明确给出https://www.googleapis.com/auth/driveAPI 的范围，要求填写大量字段，其中包括应用程序的网站/使用条件/保密规则/授权域以及解释应用程序的 YouTube 视频。不过，我将是该脚本的唯一用户。所以我只能明确给出以下范围：

/auth/drive.appdata
/auth/drive.file
/auth/drive.install

Run Code Online (Sandbox Code Playgroud)

是因为范围界定吗？有没有不需要创建主页和 YouTube 视频的解决方案？

编辑1： 这是一个示例links_to_download：

links_to_download = ["https://drive.google.com/file/d/fileID/view?usp=drivesdk&resourcekey=0-resourceKeyValue"]

Run Code Online (Sandbox Code Playgroud)

编辑2： 它非常不稳定，有时可以毫不费力地工作，有时却不能。当我多次重新启动脚本时，我得到不同的结果。重试策略在一定程度上发挥了作用，但有时会在几个小时内多次失败。

Answer 1

Kri*_*ris 4

好吧，感谢谷歌几个月前发布的安全更新。这使得链接共享更加严格，除了fileId.

根据文档，如果您想在标头中访问它，则还需要为较新的链接提供资源X-Goog-Drive-Resource-Keys密钥fileId1/resourceKey1。

如果您在代码中应用此更改，它将正常工作。编辑示例如下：

regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
regex_rkey = "(?<=resourcekey=)[a-zA-Z0-9-]+"
for i, l in enumerate(links_to_download):
    url = l
    file_id = re.search(regex, url)[0]
    resource_key = re.search(regex_rkey, url)[0]
    request = drive_service.files().get_media(fileId=file_id)
    request.headers["X-Goog-Drive-Resource-Keys"] = f"{file_id}/{resource_key}"
    fh = io.FileIO(f"file_{i}", mode='wb')
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()
        print("Download %d%%." % int(status.progress() * 100))

Run Code Online (Sandbox Code Playgroud)

好吧，资源键的正则表达式是我很快制作的，所以不能确定它是否支持所有情况。但这为您提供了解决方案。现在，您可能必须基于此收听旧链接和新链接并设置更改。

归档时间：	4 年前
查看次数：	3142 次
最近记录：	4 年前