在循环中查询来自bigquery的数据时出现错误请求错误

Ish*_*ati 5 python google-bigquery

我在循环中使用下面提到的get_data_from_bq方法查询来自bigquery的数据:

def get_data_from_bq(product_ids):
    format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
    query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
    query_job = bigquery_client.query(query, job_config=job_config)
    return query_job.result()
Run Code Online (Sandbox Code Playgroud)

虽然对于第一个查询(迭代)返回的数据是正确的,但所有后续查询都抛出了下面提到的异常

    results = query_job.result()
  File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2415, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 660, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/home/ishank/.local/lib/python2.7/site-packages/google/api_core/future/polling.py", line 120, in result
    raise self._exception
google.api_core.exceptions.BadRequest: 400 Cannot explicitly modify anonymous table xyz:_bf4dfedaed165b3ee62d8a9efa.anon1db6c519_b4ff_dbc67c17659f
Run Code Online (Sandbox Code Playgroud)

编辑1:下面是一个抛出上述异常的示例查询.此外,这在bigquery控制台中运行顺利.

select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in ("168561","175936","161684","161681","161686") and eventTime > CAST("2018-05-30 11:21:19" as DATETIME) group by eventType, productId order by productId;
Run Code Online (Sandbox Code Playgroud)

小智 8

我有完全相同的问题.问题不在于查询本身,而是您最有可能重复使用相同的问题QueryJobConfig.当您执行查询时,除非您设置了一个查询,否则destinationBigQuery会将结果存储在QueryJobConfig对象中声明的匿名表中.如果重用此配置,BigQuery会尝试将新结果存储在同一个匿名表中,从而导致错误.说实话,我并不特别喜欢这种行为.

你应该重写你的代码:

def get_data_from_bq(product_ids):
    format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
    query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
    query_job = bigquery_client.query(query, job_config=QueryJobConfig())
    return query_job.result()
Run Code Online (Sandbox Code Playgroud)

希望这可以帮助!