将逗号分隔的条目拆分为行

Yur*_*ues 5 db2 db2-9.7 csv db2-luw

我有一张这样的表:

|   ID   |  OtherID  | Data
+--------+-----------+---------------------------
|  5059  |   73831   | 5103,5107
|  5059  |   73941   | 5103,5104,5107
|  5059  |   73974   | 5103,5106,5107,5108
Run Code Online (Sandbox Code Playgroud)

结果应该返回单独的行,如下所示:

|   ID   |  OtherID  | Data
+--------+-----------+--------------------------
|  5059  |   73831   | 5103
|  5059  |   73831   | 5107
|  5059  |   73941   | 5103
|  5059  |   73941   | 5104
|  5059  |   73941   | 5107
|  5059  |   73974   | 5103
|  5059  |   73974   | 5106
|  5059  |   73974   | 5107
|  5059  |   73974   | 5108
Run Code Online (Sandbox Code Playgroud)

基本上,我需要将逗号处的数据拆分为单独的行。

结果将存储在临时表中(如:)ID, OtherID, NewID

我的 DB2 版本是 9.7

小智 3

我根据我正在做的一些工作以及Serge Rielau 和 Rick Swagerman 在 IBM 的developerWorks 上发布的解决方案的一些修改,为您的数据集提出了一个解决方案。

数据设置查询:

DECLARE GLOBAL TEMPORARY TABLE sample_data (id INTEGER, otherid integer, data VARCHAR(255)) WITH REPLACE ON COMMIT preserve rows NOT logged;
INSERT INTO session.sample_data SELECT 5059, 73831, '5103,5107' FROM sysibm.sysdummy1;
INSERT INTO session.sample_data SELECT 5059, 73941, '5103,5104,5107' FROM sysibm.sysdummy1;
INSERT INTO session.sample_data SELECT 5059, 73974, '5103,5106,5107,5108' FROM sysibm.sysdummy1;
Run Code Online (Sandbox Code Playgroud)

解决方案选择查询:

WITH
split_data AS
(
    SELECT
        id as group_by_1,
        otherid as group_by_2,
        data AS split_string,
        ','  AS split
    FROM
        session.sample_data
)
,
rec
(
    group_by_1,
    group_by_2,
    split_string,
    split,
    row_num,
    column_value,
    pos
) AS
(
    SELECT
        group_by_1,
        group_by_2,
        split_string,
        split,
        1,
        VARCHAR(SUBSTR(split_string, 1, DECODE(INSTR(split_string, split, 1), 0, LENGTH(split_string), INSTR(split_string, split, 1) - 1)), 255),
        INSTR(split_string, split, 1) + LENGTH(split)
    FROM
        split_data
    UNION ALL
    SELECT
        group_by_1,
        group_by_2,
        split_string,
        split,
        row_num + 1,
        VARCHAR(SUBSTR(split_string, pos, DECODE(INSTR(split_string, split, pos), 0, LENGTH(split_string) - pos + 1, INSTR(split_string, split, pos) - pos)), 255),
        INSTR(split_string, split, pos) + LENGTH(split)
    FROM
        rec
    WHERE
        row_num < 30000
    AND pos > LENGTH(split)
)
SELECT
    group_by_1 as id,
    group_by_2 as otherid,
    column_value AS data
FROM
    rec
ORDER BY
    group_by_1,
    group_by_2,
    row_num;
Run Code Online (Sandbox Code Playgroud)

结果:

ID  OTHERID DATA
5059    73831   5103
5059    73831   5107
5059    73941   5103
5059    73941   5104
5059    73941   5107
5059    73974   5103
5059    73974   5106
5059    73974   5107
5059    73974   5108
Run Code Online (Sandbox Code Playgroud)

评论:

通过在 REC 表定义中包含尽可能多的 GROUP_BY_X 行(O 到多个)并在两个联合子选择中匹配行,可以修改解决方案选择查询以满足您的特定结果需求。