obk*_*mrk 6 python arrays python-3.x pandas google-bigquery
Im currently trying to write a Pandas Dataframe (Python 3.x) into Google Big Query. The table has a column with dtype object that contains an array of string values.
sample of pandas table I aim to create a BQ table that maintains a nested table structure as below: sample of Big Query table with following schema: schema of Big Query table
Im using the google-cloud-bigquery library as that allows the df to convert to the Parquet format that per documentation supports nested array values:
code used:
client = bigquery.Client()
table_id = 'dataset.table'
job_config = bigquery.LoadJobConfig(
schema = [
bigquery.SchemaField('route_id', 'INTEGER'),
bigquery.SchemaField('types', 'STRING', mode='REPEATED')
],
writeDisposition="WRITE_APPEND"
)
job = client.load_table_from_dataframe(
df,
table_id,
job_config=job_config,
)
# Wait for the load job to complete.
job.result()
Run Code Online (Sandbox Code Playgroud)
but unfortunately Im getting the following error message returned:
BadRequest: 400 Error while reading data, error message: Provided schema is not compatible with the file 'prod-scotty-76a528bc-407d-4224-8951-c8ff0c71faa1'. Field 'types' is specified as REPEATED in provided schema which does not match NULLABLE as specified in the file.
What has been tried so far:
but that caused the following error: https://github.com/googleapis/python-bigquery/issues/21
令人惊讶的是,这适用于第一次迭代(CREATE_IF_NEEDED),在 BQ 中创建一个表,该表维护自动应用以下架构的嵌套结构: BQ 表的自动应用架构,但如果您尝试再次追加甚至完全相同的表,则会失败返回相同的表错误如下1。
有什么建议或提示吗?
似乎有不匹配的地方,但尚未解决。
\n\n我已经能够使用开源库pandas-gcp正确上传带有数组的数据帧正确上传带有数组的数据帧:
\n\nimport pandas as pd\nimport pandas_gbq\n\nd = {\'nested_string\': [[\'hi\', \'keloke\'], [\'io\', \'ready\']], \'route_id\': [83833, 4487]}\ndf = pd.DataFrame(data = d)\n\ntable_id = "dataset.table"\nproject_id = \'my_project\'\n\npandas_gbq.to_gbq(\n df, table_id, project_id=project_id, if_exists=\'replace\',\n)\n
Run Code Online (Sandbox Code Playgroud)\n\n无需第三方工具的其他可能的解决方法:
\n\n\xc2\xb7 使用数据流代替
\n\n\xc2\xb7 从 python 文件中,将数据帧以 csv 格式保存在 Google 存储桶中,并从 BigQuery 中提取它
\n\n您认为这些对您有用吗?
\n 归档时间: |
|
查看次数: |
5430 次 |
最近记录: |