Python:使用url从谷歌驱动器下载文件

Question

Python:使用url从谷歌驱动器下载文件

rka*_*kam 27 python download urllib2 google-drive-api pydrive

我正在尝试从谷歌驱动器下载文件,我所拥有的只是驱动器的URL.

我已经阅读了关于google api的内容,该内容涉及一些drive_service和MedioIO,它还需要一些凭据(主要是json文件/ oauth).但我无法知道它是如何工作的.

另外,尝试过urllib2 urlretrieve,但我的情况是从驱动器获取文件.尝试'wget'也没用.

尝试了pydrive库.它具有良好的上传功能,但没有下载选项.

任何帮助将不胜感激.谢谢.

Answer 1

tur*_*ula 37

如果"驱动器的网址"是指Google云端硬盘上文件的可共享链接,则以下内容可能有所帮助:

import requests

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

if __name__ == "__main__":
    file_id = 'TAKE ID FROM SHAREABLE LINK'
    destination = 'DESTINATION FILE ON YOUR DISK'
    download_file_from_google_drive(file_id, destination)

Run Code Online (Sandbox Code Playgroud)

剪辑不使用pydrive,也不使用Google Drive SDK.它使用请求模块(不知何故,它是urllib2的替代品).

从Google云端硬盘下载大型文件时,单个GET请求是不够的.需要第二个 - 请参阅google驱动器中的wget/curl大文件.

似乎幕后发生了一些变化，令牌的东西不再起作用了。然而，简单地始终包含“confirm=1”作为参数似乎是一种解决方法。 (5认同)
不工作，只是默默下载 4,0K 文件，没有警告或错误示例链接：https://drive.google.com/open?id=0B4qLcYyJmiz0TXdaTExNcW03ejA (4认同)
这给了我一个 404-not found，使用公共共享文件的 ID。任何建议可能是错误的？ (3认同)

Answer 2

Pad*_*ddy 34

我推荐gdown包。

pip install gdown

Run Code Online (Sandbox Code Playgroud)

获取您的分享链接

https://drive.google.com/file/d/0B9P1L--7Wd2vNm9zMTJWOGxobkU/view?usp=sharing

并获取 id - 例如。1TLNdIufzwesDbyr_nVTR7Zrx9oRHLM_N / 并在下面的 id 之后将其交换。

import gdown

url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vNm9zMTJWOGxobkU'
output = '20150428_collected_images.tgz'
gdown.download(url, output, quiet=False)

Run Code Online (Sandbox Code Playgroud)

重要的是，如果您通过“共享”或“获取可共享链接”创建链接，则该 URL 不起作用 - 您必须将 URL 中的“open”替换为“uc”。换句话说，`drive.google.com/open?id= ...` 到 `drive.google.com/uc?id= ...` (4认同)
我尝试按照 @AgileBean 的说明进行操作，但我的链接看起来像这样 ```https://drive.google.com/file/d/3Xxk5lJSr...UV5eX9M/view?usp=sharing``` 所以它没有工作。因此，我使用 ID 参数 ```gdown --id 3Xxk5lJSr...UV5eX9M```，其中 ```3Xxk5lJSr...UV5eX9M``` 是您可以轻松从文件链接中提取的文件 ID。 (3认同)
最好和最简单的答案。谢谢！ (2认同)
最好的。多谢！！ (2认同)
它不起作用......即使对于公共文件也是如此。我觉得很荒谬的是，在 python 上运行的输出是“你也许可以使用浏览器”。现在我只需要下载将Python转换为懂得如何操作浏览器并且拥有键盘和鼠标的人类的库...... (2认同)
在粘贴按谷歌驱动器网页上的“下载”按钮后出现的链接时，为我工作 (2认同)

Answer 3

ndr*_*plz 17

有过多次相似的需求后,我GoogleDriveDownloader从@ user115202上面的片段开始做了一个额外的简单课程.你可以在这里找到源代码.

你也可以通过pip安装它:

pip install googledrivedownloader

Run Code Online (Sandbox Code Playgroud)

然后使用就像:

from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(file_id='1iytA1n2z4go3uVCwE__vIKouTKyIDjEq',
                                    dest_path='./data/mnist.zip',
                                    unzip=True)

Run Code Online (Sandbox Code Playgroud)

此代码段将下载Google云端硬盘中共享的存档.在这种情况下,1iytA1n2z4go3uVCwE__vIKouTKyIDjEq是从Google云端硬盘获取的可共享链接的ID.

无法检索文件... `'open(/content/data.json').read()` 返回 `'<HTML>\n<HEAD>\n<TITLE>Not Found</TITLE>\n< /HEAD>\n<BODY BGCOLOR="#FFFFFF" TEXT="#000000">\n<H1>未找到</H1>\n<H2>错误 404</H2>\n</BODY>\n </HTML>\n'` (3认同)
如果我想使用 Gmail ID 和密码访问受限文件该怎么办？ (2认同)

Answer 4

Ray*_*ayB 15

这是一种无需第三方库和服务帐户即可完成此操作的简单方法。

点安装google-api-core和google-api-python-client

from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
import io

credz = {} #put json credentials her from service account or the like
# More info: https://cloud.google.com/docs/authentication

credentials = service_account.Credentials.from_service_account_info(credz)
drive_service = build('drive', 'v3', credentials=credentials)

file_id = '0BwwA4oUTeiV1UVNwOHItT0xfa2M'
request = drive_service.files().get_media(fileId=file_id)
#fh = io.BytesIO() # this can be used to keep in memory
fh = io.FileIO('file.tar.gz', 'wb') # this can be used to write to disk
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))

Run Code Online (Sandbox Code Playgroud)

Answer 5

Tia*_*ica 8

文档中有一个函数，当我们提供要下载的文件的 ID 时，它会下载文件，

from __future__ import print_function

import io

import google.auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload


def download_file(real_file_id):
    """Downloads a file
    Args:
        real_file_id: ID of the file to download
    Returns : IO object with location.

    Load pre-authorized user credentials from the environment.
    TODO(developer) - See https://developers.google.com/identity
    for guides on implementing OAuth2 for the application.
    """
    creds, _ = google.auth.default()

    try:
        # create drive api client
        service = build('drive', 'v3', credentials=creds)

        file_id = real_file_id

        # pylint: disable=maybe-no-member
        request = service.files().get_media(fileId=file_id)
        file = io.BytesIO()
        downloader = MediaIoBaseDownload(file, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print(F'Download {int(status.progress() * 100)}.')

    except HttpError as error:
        print(F'An error occurred: {error}')
        file = None

    return file.getvalue()


if __name__ == '__main__':
    download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')

Run Code Online (Sandbox Code Playgroud)

这就带来了一个问题：

我们如何获取文件ID来下载文件呢？

一般来说，来自 Google 云端硬盘的共享文件的 URL 如下所示

https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing

Run Code Online (Sandbox Code Playgroud)

其中1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh对应于文件ID。

您可以简单地从 URL 复制它，或者，如果您愿意，也可以创建一个函数来从 URL 获取 fileID。

例如，给定以下内容url = https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing，

def url_to_id(url):
    x = url.split("/")
    return x[5]

Run Code Online (Sandbox Code Playgroud)

打印 x 会给出

['https:', '', 'drive.google.com', 'file', 'd', '1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh', 'view?usp=sharing']

Run Code Online (Sandbox Code Playgroud)

因此，当我们想要返回第 6 个数组值时，我们使用x[5].

Answer 6

Rob*_*bel 5

PyDrive允许您使用功能下载文件GetContentFile()。您可以在此处找到该函数的文档。

请参阅下面的示例：

# Initialize GoogleDriveFile instance with file id.
file_obj = drive.CreateFile({'id': '<your file ID here>'})
file_obj.GetContentFile('cats.png') # Download file as 'cats.png'.

Run Code Online (Sandbox Code Playgroud)

此代码假设您有一个经过身份验证的drive对象，可以在此处和此处找到有关此的文档。

在一般情况下，这是这样做的：

from pydrive.auth import GoogleAuth

gauth = GoogleAuth()
# Create local webserver which automatically handles authentication.
gauth.LocalWebserverAuth()

# Create GoogleDrive instance with authenticated GoogleAuth instance.
drive = GoogleDrive(gauth)

Run Code Online (Sandbox Code Playgroud)

可以在此处找到有关服务器上的静默身份验证的信息，并涉及编写一个settings.yaml（示例：此处）来保存身份验证详细信息。

Answer 7

小智 5

import requests

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id , 'confirm': 1 }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

if __name__ == "__main__":
    file_id = 'TAKE ID FROM SHAREABLE LINK'
    destination = 'DESTINATION FILE ON YOUR DISK'
    download_file_from_google_drive(file_id, destination)

Run Code Online (Sandbox Code Playgroud)

只需重复接受的答案，但添加confirm=1参数，这样即使文件太大，它也总是会下载

归档时间：	9 年，5 月前
查看次数：	40853 次
最近记录：	7 年，1 月前