Google BigQuery Sum 返回错误结果

And*_*aev 0 python pandas google-bigquery

伙计们,我正在对公共区块链数据运行此查询,以获取烧毁的代币总数。但 SUM 返回的结果远小于实际结果(在没有求和的情况下运行相同的查询并在 Pandas 中运行求和)。它给出 8306 而 pandas 328608。

log.data - 十六进制数

SELECT
  SUM(SAFE_CAST(log.data as INT64)/POW(10,18))
FROM
  `bigquery-public-data.ethereum_blockchain.logs` AS log
WHERE TRUE
  AND log.address = '0xf53ad2c6851052a81b42133467480961b2321c09'
  AND log.block_timestamp >= '2018-01-01 00:00:01'
  AND log.block_timestamp <= '2018-12-01 00:00:01'
  AND SUBSTR(log.topics[SAFE_OFFSET(0)], 1, 10) IN ('0x42696c68','0xcc16f5db')
Run Code Online (Sandbox Code Playgroud)

我不太明白为什么会发生这种情况。将不胜感激的回答)

Ell*_*ard 5

The problem is that some of the log.data values are excluded from the SUM, since they don't fit in the range of INT64 and hence the SAFE_CAST(log.data AS INT64) returns NULL. As an example, 0x00000000000000000000000000000000000000000000000080b7978da47c78d2 is greater than the max INT64 value of 9223372036854775807, which is 0x7FFFFFFFFFFFFFFF in hexadecimal.

You can instead cast the log.data values to the FLOAT64 type, which produces a result closer to what you see using Pandas:

SELECT
  SUM(CAST(log.data as FLOAT64)/POW(10,18))
FROM
  `bigquery-public-data.ethereum_blockchain.logs` AS log
WHERE TRUE
  AND log.address = '0xf53ad2c6851052a81b42133467480961b2321c09'
  AND log.block_timestamp >= '2018-01-01 00:00:01'
  AND log.block_timestamp <= '2018-12-01 00:00:01'
  AND SUBSTR(log.topics[SAFE_OFFSET(0)], 1, 10) IN ('0x42696c68','0xcc16f5db')
Run Code Online (Sandbox Code Playgroud)

This returns 329681.7942642243.