没有名为“db_dtypes”的模块

Pag*_*gam 9 python dataframe pandas google-bigquery google-cloud-platform

运行一个小的 python 代码从 Bigquery 表结果创建 pandas 数据框。当我运行代码时,我看到以下结果。db_dtypes 已安装,不确定我需要添加哪些其他依赖项。任何帮助表示赞赏。

这是代码

import pandas

from google.cloud import bigquery
from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file(
    '/Users/kar/Downloads/data-4045ff698b4f.json')

project_id = 'data-platform'
client = bigquery.Client(credentials=credentials, project=project_id)



sql = """SELECT * FROM `data-platform.airbnb.raw_hosts` LIMIT 1"""
query_job = client.query(sql)
df = query_job.to_dataframe()
Run Code Online (Sandbox Code Playgroud)

错误

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ka/PycharmProjects/pythonProject4/main.py", line 17, in <module>
    df = query_job.to_dataframe()
  File "/Users/ka/PycharmProjects/pythonProject4/venv/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 1689, in to_dataframe
    geography_as_object=geography_as_object,
  File "/Users/ka/PycharmProjects/pythonProject4/venv/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1965, in to_dataframe
    _pandas_helpers.verify_pandas_imports()
  File "/Users/ka/PycharmProjects/pythonProject4/venv/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 991, in verify_pandas_imports
    raise ValueError(_NO_DB_TYPES_ERROR) from db_dtypes_import_exception
ValueError: Please install the 'db-dtypes' package to use this function.

Process finished with exit code 1
Run Code Online (Sandbox Code Playgroud)

Won*_*der 2

将 BigQuery 表导入 pandas 似乎非常痛苦,因此有一个 pandas 方法(和相关库)(请参阅pandas.read_gcp)。

我建议使用此模块而不是本机bigquery模块。

import pandas_gbq
... # your code for getting credentials and query-string
df = pandas_gbq.read_gbq(sql, project_id=project_id, credentials=credentials, progress_bar_type=None)
Run Code Online (Sandbox Code Playgroud)

对于使用这个库的我来说,我可以将响应转换为数据帧,但是当将数据帧转换为pickle格式,然后将其重新导入到没有pandas_gbq的环境时,它会产生相同的错误...

我假设普通 pandas 模块无法正确显示某些元数据。

编辑:通过查看评论,我们可以发现,只需按照pip install db-dtypes@DazWilkin 和 @Nestor Ceniza Jr 的建议即可轻松修复该错误(如果您像 @dss 一样使用 jupyter 笔记本,则重新启动内核)