从Postgresql迁移到Postgres-XL:分布式表设计

Joe*_*Joe 5 postgresql database-partitioning horizontal-scaling postgres-xl

由于数据量的原因,我需要扩展我们的应用程序数据库.它在PostgreSQL 9.3上.所以,我发现PostgreSQL-XL并且它看起来很棒,但是我很难试图绕过分布式表的限制.通过复制(在每个datanode中复制整个表)来分发它们是完全可以的,但是假设我有两个大的相关表需要沿着数据节点"分片":

CREATE TABLE foos
(
  id bigserial NOT NULL,
  project_id integer NOT NULL,
  template_id integer NOT NULL,
  batch_id integer,
  dataset_id integer NOT NULL,
  name text NOT NULL,
  CONSTRAINT pk_foos PRIMARY KEY (id),
  CONSTRAINT fk_foos_batch_id FOREIGN KEY (batch_id)
      REFERENCES batches (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_foos_dataset_id FOREIGN KEY (dataset_id)
      REFERENCES datasets (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_foos_project_id FOREIGN KEY (project_id)
      REFERENCES projects (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_foos_template_id FOREIGN KEY (template_id)
      REFERENCES templates (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT uc_foos UNIQUE (project_id, name)
);

CREATE TABLE foo_childs
(
  id bigserial NOT NULL,
  foo_id bigint NOT NULL,
  template_id integer NOT NULL,
  batch_id integer,
  ffdata hstore,
  CONSTRAINT pk_ff_foos PRIMARY KEY (id),
  CONSTRAINT fk_fffoos_batch_id FOREIGN KEY (batch_id)
      REFERENCES batches (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_fffoos_foo_id FOREIGN KEY (foo_id)
      REFERENCES foos (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_fffoos_template_id FOREIGN KEY (template_id)
      REFERENCES templates (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE 
);
Run Code Online (Sandbox Code Playgroud)

现在,Postgres-XL文档指出:

  • "(...)在分布式表中,UNIQUE约束必须包括表的分发列"
  • "(...)分发栏必须包含在PRIMARY KEY中"
  • "(...)带有REFERENCES(FK)的列应该是分发列.(...)PRIMARY KEY也必须是分发列."

他们的例子过于简单和稀疏,所以有人可以请使用DISTRIBUTE BY HASH()将 DDL上面的两个表格用于postgres-XL 吗?

或者可能建议其他扩展方式?

小智 0

CREATE TABLE foos
( ... ) DISTRIBUTE BY HASH(id);

CREATE TABLE foos_child
( ... ) DISTRIBUTE BY HASH(foo_id);
Run Code Online (Sandbox Code Playgroud)

现在,任何连接都foos.id = foos_child.foo_id可以下推并在本地完成。