Yal*_*lak 5 python pandas google-bigquery
尝试使用to_gbq更新Google BigQuery表时,得到以下响应:
GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
Run Code Online (Sandbox Code Playgroud)
我的代码:
gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)
Run Code Online (Sandbox Code Playgroud)
我的mini_df数据框如下所示:
date request_number name feature_name value_name value
2018-01-10 1 1 "a" "b" 0.309457
2018-01-10 1 1 "c" "d" 0.273748
Run Code Online (Sandbox Code Playgroud)
当我运行to_gbq时,BigQuery上没有任何表,我可以看到该表是使用下一个架构创建的:
日期STRING NULLABLE
request_number STRING NULLABLE
名称STRING NULLABLE
feature_name STRING NULLABLE
value_name STRING NULLABLE
值FLOAT NULLABLE
我究竟做错了什么?我该如何解决?
PS,其余例外:
BadRequest Traceback (most recent call last)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
589 destination_table,
--> 590 job_config=job_config).result()
591 except self.http_error as ex:
~/anaconda3/envs/env/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
527 # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528 return super(_AsyncJob, self).result(timeout=timeout)
529
~/anaconda3/envs/env/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
110 # Pylint doesn't recognize that this is valid in this case.
--> 111 raise self._exception
112
BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
During handling of the above exception, another exception occurred:
GenericGBQException Traceback (most recent call last)
<ipython-input-28-195df93249b6> in <module>()
----> 1 gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
106 chunksize=chunksize,
107 verbose=verbose, reauth=reauth,
--> 108 if_exists=if_exists, private_key=private_key)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver)
987 table.create(table_id, table_schema)
988
--> 989 connector.load_data(dataframe, dataset_id, table_id, chunksize)
990
991
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
590 job_config=job_config).result()
591 except self.http_error as ex:
--> 592 self.process_http_error(ex)
593
594 rows = []
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)
454 # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
455
--> 456 raise GenericGBQException("Reason: {0}".format(ex))
457
458 def run_query(self, query, **kwargs):
GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
Run Code Online (Sandbox Code Playgroud)
我遇到了同样的问题。
就我而言,它取决于object
数据帧的数据类型。
我有三列externalId
,mappingId
,info
。对于这些字段中的任何一个,我都没有设置数据类型,而让熊猫做到了。
决定将所有三个列数据类型都设置为object
。问题是,to_gbq
组件内部使用to_json
组件。由于某种或其他原因,如果字段的类型为object
但仅包含数值,则此输出将省略数据字段周围的引号。
所以Google Big Query需要这个
{"externalId": "12345", "mappingId":"abc123", "info":"blerb"}
Run Code Online (Sandbox Code Playgroud)
但是得到这个:
{"externalId": 12345, "mappingId":"abc123", "info":"blerb"}
Run Code Online (Sandbox Code Playgroud)
并且由于该字段的映射STRING
位于Google Big Query中,因此导入过程失败。
提出了两种解决方案。
解决方案1-更改列的数据类型
一个简单的类型转换有助于解决此问题。我还必须将Big Query中的数据类型更改为INTEGER
。
df['externalId'] = df['externalId'].astype('int')
Run Code Online (Sandbox Code Playgroud)
在这种情况下,Big Query可以使用JSON标准所说的不带引号的字段。
解决方案2-确保字符串字段是字符串
同样,这是在设置数据类型。但是由于我们将其显式设置为String
,所以with的to_json
输出会打印出带引号的字段,并且一切正常。
df['externalId'] = df['externalId'].astype('str')
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1754 次 |
最近记录: |