如果表在多列上具有唯一约束,如何不“复制”表?

Gre*_*ius 5 postgresql index-tuning postgresql-9.5

我有一个非常大的表 (35GB),它在四个列的组合中是独一无二的。

该表不是很宽,它唯一的四列是较大的列(以字节为单位)。最终结果是保持表唯一的索引是 21GB。这不是索引大小随时间膨胀的结果,而是索引创建后立即的大小。

我根本不需要优化插入速度,因为插入每月只会分批进行一次。一旦插入,任何行都不会进行任何更新。

我正在运行 PostgreSQL 9.5.0。

有没有办法不复制如此大的数据库部分来强制执行唯一约束?可能使用聚集索引之类的东西?

全表说明:

CREATE TABLE medi_cal_base_eligibility (
    client_index_number text NOT NULL,
    medi_cal_date date NOT NULL,
    eligibility_date date NOT NULL,
    aidcode text,
    responsible_county text,
    status text,
    cardinal smallint NOT NULL,
    id SERIAL PRIMARY KEY
);
Run Code Online (Sandbox Code Playgroud)

索引:

"medi_cal_base_eligibility_pkey" PRIMARY KEY, btree 
    (id)
"medi_cal_base_eligibility_uq_dates_cin_cardinal" UNIQUE CONSTRAINT, btree 
    (eligibility_date, client_index_number, medi_cal_date, cardinal)
Run Code Online (Sandbox Code Playgroud)

Eze*_*nay 0

使用 PostgreSQL 9.5,您可以使用BRIN 索引(这将使索引非常小,但功能齐全),并通过触发器处理排除,如下所示:

CREATE INDEX ON medi_cal_base_eligibility USING BRIN (client_index_number);

CREATE OR REPLACE FUNCTION tf_medi_cal_base_eligibility_insert() RETURNS trigger AS
$BODY$
BEGIN
  IF (TG_OP = 'INSERT' OR (NEW.client_index_number, NEW.eligibility_date, NEW.medi_cal_date, NEW.cardinal) IS DISTINCT FROM (OLD.client_index_number, OLD.eligibility_date, OLD.medi_cal_date, OLD.cardinal)) AND
     EXISTS (SELECT 1 FROM medi_cal_base_eligibility
             WHERE (client_index_number, eligibility_date, medi_cal_date, cardinal) = (NEW.client_index_number, NEW.eligibility_date, NEW.medi_cal_date, NEW.cardinal)) THEN
    RAISE 'Duplicate key: %, %, %, %', NEW.client_index_number, NEW.eligibility_date, NEW.medi_cal_date, NEW.cardinal;
    RETURN NULL;
  END IF;
  RETURN NEW;
END;
$BODY$ LANGUAGE plpgsql VOLATILE SECURITY DEFINER;

CREATE TRIGGER t_medi_cal_base_eligibility_insert BEFORE INSERT OR UPDATE ON medi_cal_base_eligibility
    FOR EACH ROW EXECUTE PROCEDURE tf_medi_cal_base_eligibility_insert();
Run Code Online (Sandbox Code Playgroud)

正如dezso所说,BRIN 索引仅在 client_index_number 与文件中记录的位置相关的情况下才有用。

如果上述使用 BRIN 的解决方案不能满足要求,那么使用数据的哈希值进行搜索是一个不错的选择。哈希值的大小将决定它必须扫描多少条记录来查找唯一性;此外,哈希值越大,索引就越大。32 位的哈希很可能会呈现唯一的结果(或最多少数),并且与主键一样大。在下面的示例中,我通过使用 md5 函数的最后 8 个十六进制数字(使用连接在一起的四个唯一字段)来获取 32 位哈希。

CREATE OR REPLACE FUNCTION f_medi_cal_base_eligibility_to_int (p_client_index_number text, p_medi_cal_date date, p_eligibility_date date, p_cardinal smallint) RETURNS int AS $BODY$
  SELECT ('x'||right(md5($1 || to_char($2, 'YYYYMMDD') || to_char($3, 'YYYYMMDD') || $4::text), 8))::bit(32)::int
$BODY$ LANGUAGE SQL IMMUTABLE SECURITY DEFINER;

CREATE INDEX ON medi_cal_base_eligibility (f_medi_cal_base_eligibility_to_int(client_index_number, medi_cal_date, eligibility_date, cardinal));

CREATE OR REPLACE FUNCTION tf_medi_cal_base_eligibility_insert() RETURNS trigger AS
$BODY$
BEGIN
  IF (TG_OP = 'INSERT' OR (NEW.client_index_number, NEW.eligibility_date, NEW.medi_cal_date, NEW.cardinal) IS DISTINCT FROM (OLD.client_index_number, OLD.eligibility_date, OLD.medi_cal_date, OLD.cardinal)) AND
     EXISTS (SELECT 1 FROM medi_cal_base_eligibility
             WHERE (f_medi_cal_base_eligibility_to_int(client_index_number, medi_cal_date, eligibility_date, cardinal), client_index_number, medi_cal_date, eligibility_date, cardinal) =
                   (f_medi_cal_base_eligibility_to_int(NEW.client_index_number, NEW.medi_cal_date, NEW.eligibility_date, NEW.cardinal), NEW.client_index_number, NEW.medi_cal_date, NEW.eligibility_date, NEW.cardinal)) THEN
    RAISE 'Duplicate key: %, %, %, %', NEW.client_index_number, NEW.medi_cal_date, NEW.eligibility_date, NEW.cardinal;
    RETURN NULL;
  END IF;
  RETURN NEW;
END;
$BODY$ LANGUAGE plpgsql VOLATILE SECURITY DEFINER;

CREATE TRIGGER t_medi_cal_base_eligibility_insert BEFORE INSERT OR UPDATE ON medi_cal_base_eligibility
    FOR EACH ROW EXECUTE PROCEDURE tf_medi_cal_base_eligibility_insert();
Run Code Online (Sandbox Code Playgroud)