pandas.to_gbq() 返回“ArrowTypeError:预期字节,得到“datetime.date”对象”错误

cas*_*eyr 7 python pandas google-bigquery

pandas.to_gbq()最近,当我尝试将数据帧附加到 BigQuery 表时,尽管 df 架构/数据类型与 BigQuery 表的架构/数据类型完全相同,但最近开始返回错误。

代码片段如下:

df.to_gbq(destination_table = PROCESSED_DATA_TABLE_NAME,
          project_id = PROJECT_NAME,
          if_exists = 'append')
Run Code Online (Sandbox Code Playgroud)

返回:

  File ~\Documents\DartsModel\update_processed_visit_data\main_dev.py:152 in <module>
    df.to_gbq(destination_table = PROCESSED_DATA_TABLE_NAME,

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas\core\frame.py:2054 in to_gbq
    gbq.to_gbq(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas\io\gbq.py:212 in to_gbq
    pandas_gbq.to_gbq(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\gbq.py:1198 in to_gbq
    connector.load_data(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\gbq.py:591 in load_data
    chunks = load.load_chunks(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\load.py:238 in load_chunks
    load_parquet(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\load.py:130 in load_parquet
    client.load_table_from_dataframe(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\client.py:2628 in load_table_from_dataframe
    _pandas_helpers.dataframe_to_parquet(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:672 in dataframe_to_parquet
    arrow_table = dataframe_to_arrow(dataframe, bq_schema)

  File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:617 in dataframe_to_arrow
    bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)

  File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:342 in bq_to_arrow_array
    return pyarrow.Array.from_pandas(series, type=arrow_type)

  File pyarrow\array.pxi:1033 in pyarrow.lib.Array.from_pandas

  File pyarrow\array.pxi:312 in pyarrow.lib.array

  File pyarrow\array.pxi:83 in pyarrow.lib._ndarray_to_array

  File pyarrow\error.pxi:123 in pyarrow.lib.check_status

ArrowTypeError: Expected bytes, got a 'datetime.date' object
Run Code Online (Sandbox Code Playgroud)

相关软件包版本如下:

python==3.9.12
pandas==1.4.2
pandas-gbq==0.17.6
arrow==1.2.2
google-cloud-bigquery==3.2.0
google-cloud-bigquery-storage==2.13.2
Run Code Online (Sandbox Code Playgroud)

似乎无法在线找到任何解决方案,因此我们将不胜感激!谢谢。

Dor*_*ahl 4

to_gbq() 方法的一种替代解决方案是使用 google cloud 的 bigquery 包。

虽然bigquery表和本地df的架构相同,但可以使用以下代码完成向BigQuery表的追加:

from google.cloud import bigquery
import pandas as pd

client = bigquery.Client()

# define project, dataset, and table_name variables
project, dataset, table_name = "project", "dataset", "table_name"
table_id = f"{project}.{dataset}.{table_name}"

job_config = bigquery.job.LoadJobConfig()

# set write_disposition parameter as WRITE_APPEND for appending to table
job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND

job = client.load_table_from_dataframe(df, table_id, job_config=job_config)

job.result()  # Wait for the job to complete.

table = client.get_table(table_id)  # Make an API request.
print(
    f"Loaded {table.num_rows} rows and {len(table.schema)} columns to {table_id}"
)
Run Code Online (Sandbox Code Playgroud)

最终结果产生相同的结果 - 附加到 BigQuery 表。