How to write array of string values from Pandas to Google Big Query

Question

How to write array of string values from Pandas to Google Big Query

obk*_*mrk 6 python arrays python-3.x pandas google-bigquery

Im currently trying to write a Pandas Dataframe (Python 3.x) into Google Big Query. The table has a column with dtype object that contains an array of string values.

sample of pandas table I aim to create a BQ table that maintains a nested table structure as below: sample of Big Query table with following schema: schema of Big Query table

Im using the google-cloud-bigquery library as that allows the df to convert to the Parquet format that per documentation supports nested array values:

code used:

client = bigquery.Client()
table_id = 'dataset.table'

job_config = bigquery.LoadJobConfig(
    schema = [
            bigquery.SchemaField('route_id', 'INTEGER'),
            bigquery.SchemaField('types', 'STRING', mode='REPEATED')
    ], 
    writeDisposition="WRITE_APPEND"
)

job = client.load_table_from_dataframe(
    df, 
    table_id, 
    job_config=job_config,
)

# Wait for the load job to complete.
job.result()

Run Code Online (Sandbox Code Playgroud)

but unfortunately Im getting the following error message returned:

BadRequest: 400 Error while reading data, error message: Provided schema is not compatible with the file 'prod-scotty-76a528bc-407d-4224-8951-c8ff0c71faa1'. Field 'types' is specified as REPEATED in provided schema which does not match NULLABLE as specified in the file.

What has been tried so far:

used RECORD field type

but that caused the following error: https://github.com/googleapis/python-bigquery/issues/21

根本不在 python 中传送任何模式（并允许 Python/BQ 自行对其进行排序）

令人惊讶的是，这适用于第一次迭代（CREATE_IF_NEEDED），在 BQ 中创建一个表，该表维护自动应用以下架构的嵌套结构： BQ 表的自动应用架构，但如果您尝试再次追加甚至完全相同的表，则会失败返回相同的表错误如下1。

有什么建议或提示吗？

Answer 1

Alb*_*esa 0

似乎有不匹配的地方，但尚未解决。

\n\n

我已经能够使用开源库pandas-gcp正确上传带有数组的数据帧正确上传带有数组的数据帧：

\n\n

import pandas as pd\nimport pandas_gbq\n\nd = {\'nested_string\': [[\'hi\', \'keloke\'], [\'io\', \'ready\']], \'route_id\': [83833, 4487]}\ndf = pd.DataFrame(data = d)\n\ntable_id = "dataset.table"\nproject_id = \'my_project\'\n\npandas_gbq.to_gbq(\n    df, table_id, project_id=project_id, if_exists=\'replace\',\n)\n

Run Code Online (Sandbox Code Playgroud)\n\n

无需第三方工具的其他可能的解决方法：

\n\n

\xc2\xb7 使用数据流代替

\n\n

\xc2\xb7 从 python 文件中，将数据帧以 csv 格式保存在 Google 存储桶中，并从 BigQuery 中提取它

\n\n

您认为这些对您有用吗？

\n

归档时间：	5 年，6 月前
查看次数：	5430 次
最近记录：	5 年，5 月前