如何使用多列中的值分发表?

Eug*_*kov 4 postgresql citus

Acording到了西特斯文件,很容易使用单一柱分配表:

SELECT master_create_distributed_table('github_events', 'created_at', 'append');
Run Code Online (Sandbox Code Playgroud)

有没有办法使用多列分发表?例如,类似于:

SELECT master_create_distributed_table('github_events', 'user_id,site_id', 'append');
Run Code Online (Sandbox Code Playgroud)

Ahm*_*şak 5

Citus不支持按多列分发.但是,您可以创建复合类型并按该复合类型对数据进行分区.

- 如果链接失效,下面将链接内容链接 -

在复合类型上进行散列分区的步骤

  1. 在主节点所有工作节点上创建类型:

    CREATE TYPE new_composite_type as (project_key text, date text);
    
    Run Code Online (Sandbox Code Playgroud)
  2. 创建用于检查相等性的函数,并将其与新类型的相等运算符相关联

    CREATE FUNCTION equal_test_composite_type_function(new_composite_type, new_composite_type) RETURNS boolean
    AS 'select $1.project_key = $2.project_key AND $1.date = $2.date;'
    LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;
    
    -- ... use that function to create a custom equality operator...
    CREATE OPERATOR = (
        LEFTARG = new_composite_type,
        RIGHTARG = new_composite_type,
        PROCEDURE = equal_test_composite_type_function,
        HASHES
    );
    
    Run Code Online (Sandbox Code Playgroud)
  3. 创建一个新的哈希函数.

    注意:这只是一个简单的例子,可能无法提供良好的均匀哈希分布.有几个好的散列函数的例子可以在一个单独的C函数而不是SQL中实现.

    CREATE FUNCTION new_composite_type_hash(new_composite_type) RETURNS int
    AS 'SELECT hashtext( ($1.project_key || $1.date)::text);'   
    LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;
    
    Run Code Online (Sandbox Code Playgroud)
  4. 为BTREE和HASH访问方法定义运算符类:

    CREATE OPERATOR CLASS new_op_fam_btree_class
    DEFAULT FOR TYPE new_composite_type USING BTREE AS
    OPERATOR 3 = (new_composite_type, new_composite_type);
    
    CREATE OPERATOR CLASS new_op_fam_hash_class
    DEFAULT FOR TYPE new_composite_type USING HASH AS
    OPERATOR 1 = (new_composite_type, new_composite_type),
    FUNCTION 1 new_composite_type_hash(new_composite_type);
    
    Run Code Online (Sandbox Code Playgroud)
  5. 使用新类型创建表并分发它.

    CREATE TABLE composite_type_partitioned_table
    (
        id integer,
        composite_column new_composite_type
    );
    
    SELECT master_create_distributed_table('composite_type_partitioned_table','composite_column', 'hash');
    
    SELECT master_create_worker_shards('composite_type_partitioned_table', 4, 1);
    
    Run Code Online (Sandbox Code Playgroud)
  6. 运行INSERT和SELECT.请注意,正确的修剪将需要引用,如这些查询中所示.

    INSERT INTO composite_type_partitioned_table VALUES  (1, '("key1","20160101")'::new_composite_type);
    INSERT INTO composite_type_partitioned_table VALUES  (2, '("key1","20160102")'::new_composite_type);
    INSERT INTO composite_type_partitioned_table VALUES  (3, '("key2","20160101")'::new_composite_type);
    INSERT INTO composite_type_partitioned_table VALUES  (4, '("key2","20160102")'::new_composite_type);
    
    SELECT * FROM composite_type_partitioned_table WHERE composite_column =  '("key1", "20160101")'::new_composite_type;
    
    UPDATE composite_type_partitioned_table SET id = 6 WHERE composite_column =  '("key2", "20160101")'::new_composite_type;
    
    SELECT * FROM composite_type_partitioned_table WHERE composite_column =  '("key2", "20160101")'::new_composite_type;
    
    Run Code Online (Sandbox Code Playgroud)

其他说明:

有两个注意事项要警惕:

  1. 必须正确分隔输入文件以允许copy_to_distributed_table工作.为此,请将COPY (SELECT ()::composite_type_field, .... );普通表用于文件,然后加载.

  2. 要修剪以使用选择查询,复合类型字段应使用引号.