use*_*204 3 python pandas google-bigquery
我的谷歌表中的架构如下所示:
price_datetime : DATETIME,
symbol : STRING,
bid_open : FLOAT,
bid_high : FLOAT,
bid_low : FLOAT,
bid_close : FLOAT,
ask_open : FLOAT,
ask_high : FLOAT,
ask_low : FLOAT,
ask_close : FLOAT
Run Code Online (Sandbox Code Playgroud)
在我做了一个之后,pandas.read_gbq我得到了一个dataframe像这样的列 dtypes:
price_datetime object
symbol object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype: object
Run Code Online (Sandbox Code Playgroud)
现在我想使用,to_gbq所以我从这些 dtypes 转换我的本地数据帧(我刚刚制作的):
price_datetime datetime64[ns]
symbol object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype: object
Run Code Online (Sandbox Code Playgroud)
到这些数据类型:
price_datetime object
symbol object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype: object
Run Code Online (Sandbox Code Playgroud)
通过做:
df['price_datetime'] = df['price_datetime'].astype(object)
Run Code Online (Sandbox Code Playgroud)
现在我(认为)我可以使用,to_gbq所以我这样做:
import pandas
pandas.io.gbq.to_gbq(df, <table_name>, <project_name>, if_exists='append')
Run Code Online (Sandbox Code Playgroud)
但我收到错误:
---------------------------------------------------------------------------
InvalidSchema Traceback (most recent call last)
<ipython-input-15-d5a3f86ad382> in <module>()
1 a = time.time()
----> 2 pandas.io.gbq.to_gbq(df, <table_name>, <project_name>, if_exists='append')
3 b = time.time()
4
5 print(b-a)
C:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
825 elif if_exists == 'append':
826 if not connector.verify_schema(dataset_id, table_id, table_schema):
--> 827 raise InvalidSchema("Please verify that the structure and "
828 "data types in the DataFrame match the "
829 "schema of the destination table.")
InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
Run Code Online (Sandbox Code Playgroud)
这可能是与熊猫有关的问题。如果您检查to_gbq的代码,您会看到它运行以下代码:
table_schema = _generate_bq_schema(dataframe)
Run Code Online (Sandbox Code Playgroud)
哪里_generate_bq_schema给出:
def _generate_bq_schema(df, default_type='STRING'):
""" Given a passed df, generate the associated Google BigQuery schema.
Parameters
----------
df : DataFrame
default_type : string
The default big query type in case the type of the column
does not exist in the schema.
"""
type_mapping = {
'i': 'INTEGER',
'b': 'BOOLEAN',
'f': 'FLOAT',
'O': 'STRING',
'S': 'STRING',
'U': 'STRING',
'M': 'TIMESTAMP'
}
fields = []
for column_name, dtype in df.dtypes.iteritems():
fields.append({'name': column_name,
'type': type_mapping.get(dtype.kind, default_type)})
return {'fields': fields}
Run Code Online (Sandbox Code Playgroud)
如您所见,没有类型映射到DATETIME. 这不可避免地被映射到类型STRING(因为它dtype.kind是“O”),然后发生冲突。
唯一的解决,现在,我所知道的就是,从改变你的表模式DATETIME要么TIMESTAMP或STRING。
在 pandas-bq 存储库上开始一个新问题,要求更新此代码以接受DATETIME,这可能是一个好主意。
[编辑]:
我已经在他们的存储库中打开了这个问题。
| 归档时间: |
|
| 查看次数: |
5774 次 |
| 最近记录: |