查询HIVE元数据

bda*_*bda 0 hadoop hive hiveql hortonworks-data-platform hadoop2

我需要查询下表并查看我的 Apache HIVE 集群中的信息:

每行需要包含以下内容:

表模式

表名

表说明

列名称

列数据类型

柱长

立柱精度

柱标尺

空或非空

主要关键指标

这可以很容易地从大多数 RDBMS(元数据表/视图)中查询,但我很难找到有关 HIVE 中等效元数据表/视图的大量信息。

请帮忙 :)

Jag*_*rma 7

此信息可从 Hive 元存储中获取。以下示例查询适用于 MySQL 支持的元存储(Hive 版本 1.2)。

SELECT 
DBS.NAME AS TABLE_SCHEMA,
TBLS.TBL_NAME AS TABLE_NAME,
TBL_COMMENTS.TBL_COMMENT AS TABLE_DESCRIPTION,
COLUMNS_V2.COLUMN_NAME AS COLUMN_NAME,
COLUMNS_V2.TYPE_NAME AS COLUMN_DATA_TYPE_DETAILS
FROM DBS
JOIN TBLS ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2 ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN 
    (
        SELECT DISTINCT TBL_ID, TBL_COMMENT 
        FROM 
        (
            SELECT TBLS.TBL_ID TBL_ID, TABLE_PARAMS.PARAM_KEY, TABLE_PARAMS.PARAM_VALUE, CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment' THEN TABLE_PARAMS.PARAM_VALUE ELSE '' END TBL_COMMENT
            FROM TBLS JOIN TABLE_PARAMS
            ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID
        ) TBL_COMMENTS_INTERNAL
    ) TBL_COMMENTS 
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;
Run Code Online (Sandbox Code Playgroud)

示例输出:

+--------------+----------------------+-----------------------+-------------------+------------------------------+
| TABLE_SCHEMA | TABLE_NAME           | TABLE_DESCRIPTION     | COLUMN_NAME       | COLUMN_DATA_TYPE_DETAILS     |
+--------------+----------------------+-----------------------+-------------------+------------------------------+
| default      | temp003              | This is temp003 table | col1              | string                       |
| default      | temp003              | This is temp003 table | col2              | array<string>                |
| default      | temp003              | This is temp003 table | col3              | array<string>                |
| default      | temp003              | This is temp003 table | col4              | int                          |
| default      | temp003              | This is temp003 table | col5              | decimal(10,2)                |
| default      | temp004              |                       | col11             | string                       |
| default      | temp004              |                       | col21             | array<string>                |
| default      | temp004              |                       | col31             | array<string>                |
| default      | temp004              |                       | col41             | int                          |
| default      | temp004              |                       | col51             | decimal(10,2)                |
+--------------+----------------------+-----------------------+-------------------+------------------------------+
Run Code Online (Sandbox Code Playgroud)

查询中引用的元存储表:

DBS: Details of databases/schemas.
TBLS: Details of tables.
COLUMNS_V2: Details about columns.
SDS: Details about storage.
TABLE_PARAMS: Details about table parameters (key-value pairs)
Run Code Online (Sandbox Code Playgroud)