相关疑难解决方法(0)

将Pandas DataFrame写入Google Cloud Storage或BigQuery

您好,感谢您的时间和考虑.我正在Google Cloud Platform/Datalab中开发一个Jupyter笔记本.我创建了一个Pandas DataFrame,并希望将此DataFrame写入Google云端存储(GCS)和/或BigQuery.我在GCS中有一个存储桶,并通过以下代码创建了以下对象:

import gcp
import gcp.storage as storage
project = gcp.Context.default().project_id    
bucket_name = 'steve-temp'           
bucket_path  = bucket_name   
bucket = storage.Bucket(bucket_path)
bucket.exists()

Run Code Online (Sandbox Code Playgroud)

我尝试过基于Google Datalab文档的各种方法,但仍然失败.谢谢

python google-cloud-storage google-cloud-platform google-cloud-datalab

Eco*_*ior

2016 03-31

23
推荐指数

7
解决办法

2万
查看次数

从 Pandas 数据帧创建 BigQuery 表，无需明确指定架构

我有一个 Pandas 数据框，想从中创建一个 BigQuery 表。我知道有很多帖子都在问这个问题，但到目前为止我能找到的所有答案都需要明确指定每一列的架构。例如：

from google.cloud import bigquery as bq

client = bq.Client()

dataset_ref = client.dataset('my_dataset', project = 'my_project')
table_ref = dataset_ref.table('my_table')  

job_config = bq.LoadJobConfig( 
 schema=[ 
     bq.SchemaField("a", bq.enums.SqlTypeNames.STRING),
     bq.SchemaField("b", bq.enums.SqlTypeNames.INT64), 
     bq.SchemaField("c", bq.enums.SqlTypeNames.FLOAT64),         
 ]
) 

client.load_table_from_dataframe(my_df, table_ref, job_config=job_config).result()

Run Code Online (Sandbox Code Playgroud)

但是，有时我有一个包含许多列（例如 100 列）的数据框，指定所有列确实很重要。有没有办法有效地做到这一点？

顺便说一句，我发现这篇文章有类似的问题：Efficiently write a Pandas dataframe to Google BigQuery 但似乎bq.Schema.from_dataframe不存在：

AttributeError: module 'google.cloud.bigquery' has no attribute 'Schema'

Run Code Online (Sandbox Code Playgroud)

python pandas google-bigquery

use*_*451

2020 08-03

3
推荐指数

1
解决办法

3122
查看次数

标签统计

python ×2

google-bigquery ×1

google-cloud-datalab ×1

google-cloud-platform ×1

google-cloud-storage ×1

pandas ×1

将Pandas DataFrame写入Google Cloud Storage或BigQuery

从 Pandas 数据帧创建 BigQuery 表，无需明确指定架构

标签 统计

标签统计