cas*_*eyr 7 python pandas google-bigquery
pandas.to_gbq()最近,当我尝试将数据帧附加到 BigQuery 表时,尽管 df 架构/数据类型与 BigQuery 表的架构/数据类型完全相同,但最近开始返回错误。
代码片段如下:
df.to_gbq(destination_table = PROCESSED_DATA_TABLE_NAME,
project_id = PROJECT_NAME,
if_exists = 'append')
Run Code Online (Sandbox Code Playgroud)
返回:
File ~\Documents\DartsModel\update_processed_visit_data\main_dev.py:152 in <module>
df.to_gbq(destination_table = PROCESSED_DATA_TABLE_NAME,
File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas\core\frame.py:2054 in to_gbq
gbq.to_gbq(
File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas\io\gbq.py:212 in to_gbq
pandas_gbq.to_gbq(
File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\gbq.py:1198 in to_gbq
connector.load_data(
File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\gbq.py:591 in load_data
chunks = load.load_chunks(
File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\load.py:238 in load_chunks
load_parquet(
File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\load.py:130 in load_parquet
client.load_table_from_dataframe(
File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\client.py:2628 in load_table_from_dataframe
_pandas_helpers.dataframe_to_parquet(
File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:672 in dataframe_to_parquet
arrow_table = dataframe_to_arrow(dataframe, bq_schema)
File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:617 in dataframe_to_arrow
bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)
File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:342 in bq_to_arrow_array
return pyarrow.Array.from_pandas(series, type=arrow_type)
File pyarrow\array.pxi:1033 in pyarrow.lib.Array.from_pandas
File pyarrow\array.pxi:312 in pyarrow.lib.array
File pyarrow\array.pxi:83 in pyarrow.lib._ndarray_to_array
File pyarrow\error.pxi:123 in pyarrow.lib.check_status
ArrowTypeError: Expected bytes, got a 'datetime.date' object
Run Code Online (Sandbox Code Playgroud)
相关软件包版本如下:
python==3.9.12
pandas==1.4.2
pandas-gbq==0.17.6
arrow==1.2.2
google-cloud-bigquery==3.2.0
google-cloud-bigquery-storage==2.13.2
Run Code Online (Sandbox Code Playgroud)
似乎无法在线找到任何解决方案,因此我们将不胜感激!谢谢。
to_gbq() 方法的一种替代解决方案是使用 google cloud 的 bigquery 包。
虽然bigquery表和本地df的架构相同,但可以使用以下代码完成向BigQuery表的追加:
from google.cloud import bigquery
import pandas as pd
client = bigquery.Client()
# define project, dataset, and table_name variables
project, dataset, table_name = "project", "dataset", "table_name"
table_id = f"{project}.{dataset}.{table_name}"
job_config = bigquery.job.LoadJobConfig()
# set write_disposition parameter as WRITE_APPEND for appending to table
job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
job.result() # Wait for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
f"Loaded {table.num_rows} rows and {len(table.schema)} columns to {table_id}"
)
Run Code Online (Sandbox Code Playgroud)
最终结果产生相同的结果 - 附加到 BigQuery 表。
| 归档时间: |
|
| 查看次数: |
11680 次 |
| 最近记录: |