我意识到有一百万种方法可以从google big query中的dataset.table获取模式....
有没有办法通过select语句获取架构数据?比如查询sql server INFORMATION_SCHEMA表?
谢谢.
我需要执行数据分析,我唯一的工具是webui上的QUERY函数.我想创建一个计算空值,非空值,字符串长度等每列的查询
下面是给你的潜在方向/理念,探索和提高了您的需求
它适用于简单的模式比较好-看起来像被调整为与记录模式的需要和反复
另外,还要注意它跳过这是在所有行的NULL列表 - 所以这样的列对于下面的方法是不可见的
因此,fh-bigquery.reddit.subreddits作为一个简单的测试表:
#standardSQL
WITH `table` AS (
SELECT * FROM `fh-bigquery.reddit.subreddits`
),
table_as_json AS (
SELECT REGEXP_REPLACE(TO_JSON_STRING(t), r'^{|}$', '') AS row
FROM `table` AS t
),
pairs AS (
SELECT
REPLACE(column_name, '"', '') AS column_name,
IF(SAFE_CAST(column_value AS STRING)='null',NULL,column_value) AS column_value
FROM table_as_json, UNNEST(SPLIT(row, ',"')) AS z,
UNNEST([SPLIT(z, ':')[SAFE_OFFSET(0)]]) AS column_name,
UNNEST([SPLIT(z, ':')[SAFE_OFFSET(1)]]) AS column_value
)
SELECT
column_name,
COUNT(DISTINCT column_value) AS _distinct_values,
COUNTIF(column_value IS NULL) AS _nulls,
COUNTIF(column_value IS NOT NULL) AS _non_nulls,
MIN(LENGTH(SAFE_CAST(column_value AS STRING))) AS _min_length,
MAX(LENGTH(SAFE_CAST(column_value AS STRING))) AS _max_length,
ROUND(AVG(LENGTH(SAFE_CAST(column_value AS STRING)))) AS _avr_length
FROM pairs
WHERE column_name <> ''
GROUP BY column_name
ORDER BY column_name
Run Code Online (Sandbox Code Playgroud)
结果是
column_name _nulls _non_nulls _min_length _max_length _avr_length
----------- ------ ---------- ----------- ----------- -----------
c_posts 0 2499 1 4 4.0
created_utc 0 2499 14 14 14.0
downs 0 2499 1 8 5.0
num_comments 0 2499 1 7 5.0
score 0 2499 1 7 5.0
subr 0 2499 4 23 12.0
ups 0 2499 1 8 5.0
Run Code Online (Sandbox Code Playgroud)
我认为它非常接近所谓的性能分析(并且在可用的范围内)
您可以轻松添加任何列度量标准等.
我真的觉得 - 这对你来说是个不错的起点
| 归档时间: |
|
| 查看次数: |
1249 次 |
| 最近记录: |