在Google Cloud Dataproc中运行Spark作业.使用BigQuery Connector将作业的json数据输出加载到BigQuery表中.
BigQuery Standard-SQL数据类型文档说明支持ARRAY类型.
我的Scala代码是:
val outputDatasetId = "mydataset"
val tableSchema = "["+
"{'name': '_id', 'type': 'STRING'},"+
"{'name': 'array1', 'type': 'ARRAY'},"+
"{'name': 'array2', 'type': 'ARRAY'},"+
"{'name': 'number1', 'type': 'FLOAT'}"+
"]"
// Output configuration
BigQueryConfiguration.configureBigQueryOutput(
conf, projectId, outputDatasetId, "outputTable",
tableSchema)
//Write visits to BigQuery
jsonData.saveAsNewAPIHadoopDataset(conf)
Run Code Online (Sandbox Code Playgroud)
但这项工作抛出了这个例外:
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid value for: ARRAY is not a valid value",
"reason" : "invalid"
} ],
"message" : "Invalid …Run Code Online (Sandbox Code Playgroud) hadoop scala google-bigquery apache-spark google-cloud-dataproc