在Oracle中将字符串拆分为多行

mar*_*law 98 sql string oracle plsql tokenize

我知道这已经在某种程度上得到了PHP和MYSQL的回答,但我想知道是否有人可以教我最简单的方法将字符串(逗号分隔)拆分为Oracle 10g(最好)和11g中的多行.

表格如下:

Name | Project | Error 
108    test      Err1, Err2, Err3
109    test2     Err1
Run Code Online (Sandbox Code Playgroud)

我想创建以下内容:

Name | Project | Error
108    Test      Err1
108    Test      Err2 
108    Test      Err3 
109    Test2     Err1
Run Code Online (Sandbox Code Playgroud)

我已经看到了一些围绕堆栈的潜在解决方案,但是它们只占了一个列(以逗号分隔的字符串).任何帮助将不胜感激.

Nef*_*reo 109

使用大型数据集时,接受的答案性能较差.

这可能是一种改进的方式(也可以使用regexp和connect by):

with temp as
(
    select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual
    union all
    select 109, 'test2', 'Err1' from dual
)
select distinct
  t.name, t.project,
  trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))  as error
from 
  temp t,
  table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
order by name
Run Code Online (Sandbox Code Playgroud)

编辑:这是一个简单的(如在"非深入")查询的解释.

  1. length (regexp_replace(t.error, '[^,]+')) + 1用于regexp_replace擦除任何不是分隔符的东西(在本例中为逗号)并length +1获取有多少元素(错误).
  2. select level from dual connect by level <= (...)使用分层查询到与越来越多的比赛中,从1到错误的总数创建列.

    预习:

    select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1 as max 
    from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1
    
    Run Code Online (Sandbox Code Playgroud)
  3. table(cast(multiset(.....) as sys.OdciNumberList)) 做一些oracle类型的演员.
    • cast(multiset(.....)) as sys.OdciNumberList变换多个集合(一个收集在原始数据集中的每一行)转换成数字的单个集合,OdciNumberList.
    • table()函数将集合转换为结果集.
  4. FROM没有连接会在数据集和多集之间创建交叉连接.结果,具有4个匹配的数据集中的行将重复4次(在名为"column_value"的列中具有增加的数字).

    预习:

    select * from 
    temp t,
    table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
    
    Run Code Online (Sandbox Code Playgroud)
  5. trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))使用column_valueas作为nth_appearance/ocurrence参数regexp_substr.
  6. 您可以从数据集中添加一些其他列(t.name, t.project作为示例)以便于可视化.

一些对Oracle文档的引用:

  • 从11g你可以使用`regexp_count(t.error,',')`而不是`length(regexp_replace(t.error,'[^,] +'))`,这可能会带来另一个性能提升 (12认同)
  • 谨防!如果列表中有null元素,则解析字符串格式为''[^,] +'`的正则表达式不会返回正确的项.有关详细信息,请参阅此处:/sf/ask/2202499281/#31464699 (6认同)

Lal*_*r B 28

以下两者之间存在巨大差异:

  • 拆分单个分隔的字符串
  • 拆分表中多行的分隔字符串.

如果不限制行,则CONNECT BY子句将生成多行,并且不会提供所需的输出.

正则表达式外,还有一些其他选择:

  • XMLTable的
  • MODEL子句

建立

SQL> CREATE TABLE t (
  2    ID          NUMBER GENERATED ALWAYS AS IDENTITY,
  3    text        VARCHAR2(100)
  4  );

Table created.

SQL>
SQL> INSERT INTO t (text) VALUES ('word1, word2, word3');

1 row created.

SQL> INSERT INTO t (text) VALUES ('word4, word5, word6');

1 row created.

SQL> INSERT INTO t (text) VALUES ('word7, word8, word9');

1 row created.

SQL> COMMIT;

Commit complete.

SQL>
SQL> SELECT * FROM t;

        ID TEXT
---------- ----------------------------------------------
         1 word1, word2, word3
         2 word4, word5, word6
         3 word7, word8, word9

SQL>
Run Code Online (Sandbox Code Playgroud)

使用XMLTABLE:

SQL> SELECT id,
  2         trim(COLUMN_VALUE) text
  3  FROM t,
  4    xmltable(('"'
  5    || REPLACE(text, ',', '","')
  6    || '"'))
  7  /

        ID TEXT
---------- ------------------------
         1 word1
         1 word2
         1 word3
         2 word4
         2 word5
         2 word6
         3 word7
         3 word8
         3 word9

9 rows selected.

SQL>
Run Code Online (Sandbox Code Playgroud)

使用MODEL子句:

SQL> WITH
  2  model_param AS
  3     (
  4            SELECT id,
  5                      text AS orig_str ,
  6                   ','
  7                          || text
  8                          || ','                                 AS mod_str ,
  9                   1                                             AS start_pos ,
 10                   Length(text)                                   AS end_pos ,
 11                   (Length(text) - Length(Replace(text, ','))) + 1 AS element_count ,
 12                   0                                             AS element_no ,
 13                   ROWNUM                                        AS rn
 14            FROM   t )
 15     SELECT   id,
 16              trim(Substr(mod_str, start_pos, end_pos-start_pos)) text
 17     FROM     (
 18                     SELECT *
 19                     FROM   model_param MODEL PARTITION BY (id, rn, orig_str, mod_str)
 20                     DIMENSION BY (element_no)
 21                     MEASURES (start_pos, end_pos, element_count)
 22                     RULES ITERATE (2000)
 23                     UNTIL (ITERATION_NUMBER+1 = element_count[0])
 24                     ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1,
 25                     end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) )
 26                 )
 27     WHERE    element_no != 0
 28     ORDER BY mod_str ,
 29           element_no
 30  /

        ID TEXT
---------- --------------------------------------------------
         1 word1
         1 word2
         1 word3
         2 word4
         2 word5
         2 word6
         3 word7
         3 word8
         3 word9

9 rows selected.

SQL>
Run Code Online (Sandbox Code Playgroud)


And*_*lev 27

正则表达式是一件很棒的事:)

with temp as  (
       select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual
       union all
       select 109, 'test2', 'Err1' from dual
     )

SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str
  FROM (SELECT Name, Project, Error str FROM temp) t
CONNECT BY instr(str, ',', 1, level - 1) > 0
order by Name
Run Code Online (Sandbox Code Playgroud)

  • 非常慢,下面有一个更好的答案 (3认同)
  • 由于@JagadeeshG,该查询无法使用,尤其是在巨大的表格上. (2认同)

Art*_*Art 7

还有几个相同的例子:

SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
  FROM dual
CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1
/

SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
  FROM dual
CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1
/
Run Code Online (Sandbox Code Playgroud)

此外,可以使用DBMS_UTILITY.comma_to_table&table_to_comma:http: //www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table


小智 6

我想提出一种使用PIPELINED表函数的不同方法.它有点类似于XMLTABLE的技术,除了你提供自己的自定义函数来分割字符串:

-- Create a collection type to hold the results
CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30);
/

-- Split the string according to the specified delimiter
CREATE OR REPLACE FUNCTION str2tbl (
  p_string    VARCHAR2,
  p_delimiter CHAR DEFAULT ',' 
)
RETURN typ_str2tbl_nst PIPELINED
AS
  l_tmp VARCHAR2(32000) := p_string || p_delimiter;
  l_pos NUMBER;
BEGIN
  LOOP
    l_pos := INSTR( l_tmp, p_delimiter );
    EXIT WHEN NVL( l_pos, 0 ) = 0;
    PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) );
    l_tmp := SUBSTR( l_tmp, l_pos+1 );
  END LOOP;
END str2tbl;
/

-- The problem solution
SELECT name, 
       project, 
       TRIM(COLUMN_VALUE) error
  FROM t, TABLE(str2tbl(error));
Run Code Online (Sandbox Code Playgroud)

结果:

      NAME PROJECT    ERROR
---------- ---------- --------------------
       108 test       Err1
       108 test       Err2
       108 test       Err3
       109 test2      Err1
Run Code Online (Sandbox Code Playgroud)

这种方法的问题在于,优化器通常不会知道表函数的基数,因此必须进行猜测.这可能会对您的执行计划产生潜在的危害,因此可以扩展此解决方案以为优化程序提供执行统计信息.

您可以通过在上面的查询上运行EXPLAIN PLAN来查看此优化程序估计值:

Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806

----------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         | 16336 |   366K|    59   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                      |         | 16336 |   366K|    59   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                | T       |     2 |    42 |     3   (0)| 00:00:01 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| STR2TBL |  8168 | 16336 |    28   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

即使集合只有3个值,优化器也会估计8168行(默认值).这看起来似乎无关紧要,但优化者可能已经足够决定次优计划.

解决方案是使用优化程序扩展来为集合提供统计信息:

-- Create the optimizer interface to the str2tbl function
CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT (
  dummy NUMBER,

  STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
  RETURN NUMBER,

  STATIC FUNCTION ODCIStatsTableFunction ( p_function  IN  SYS.ODCIFuncInfo,
                                           p_stats     OUT SYS.ODCITabFuncStats,
                                           p_args      IN  SYS.ODCIArgDescList,
                                           p_string    IN  VARCHAR2,
                                           p_delimiter IN  CHAR DEFAULT ',' )
  RETURN NUMBER
);
/

-- Optimizer interface implementation
CREATE OR REPLACE TYPE BODY typ_str2tbl_stats
AS
  STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
  RETURN NUMBER
  AS
  BEGIN
    p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') );
    RETURN ODCIConst.SUCCESS;
  END ODCIGetInterfaces;

  -- This function is responsible for returning the cardinality estimate
  STATIC FUNCTION ODCIStatsTableFunction ( p_function  IN  SYS.ODCIFuncInfo,
                                           p_stats     OUT SYS.ODCITabFuncStats,
                                           p_args      IN  SYS.ODCIArgDescList,
                                           p_string    IN  VARCHAR2,
                                           p_delimiter IN  CHAR DEFAULT ',' )
  RETURN NUMBER
  AS
  BEGIN
    -- I'm using basically half the string lenght as an estimator for its cardinality
    p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) );
    RETURN ODCIConst.SUCCESS;
  END ODCIStatsTableFunction;

END;
/

-- Associate our optimizer extension with the PIPELINED function   
ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats;
Run Code Online (Sandbox Code Playgroud)

测试生成的执行计划:

Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806

----------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         |     1 |    23 |    59   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                      |         |     1 |    23 |    59   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                | T       |     2 |    42 |     3   (0)| 00:00:01 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| STR2TBL |     1 |     2 |    28   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,上面计划中的基数不再是8196猜测的值.它仍然不正确,因为我们正在将一列而不是字符串文字传递给该函数.

在这种特殊情况下,需要对函数代码进行一些调整以进行更接近的估计,但我认为整体概念在这里有很多解释.

这个答案中使用的str2tbl函数最初由Tom Kyte开发:https://asktom.oracle.com/pls/asktom/f?p = 100:11:0 :::: P11_QUESTION_ID:110612348061

通过阅读本文可以进一步探索将统计与对象类型相关联的概念:http: //www.oracle-developer.net/display.php?id = 427

这里描述的技术适用于10g +.


Luk*_*zda 5

从 Oracle 12c 开始,您可以使用JSON_TABLEJSON_ARRAY

CREATE TABLE tab(Name, Project, Error) AS
SELECT 108,'test' ,'Err1, Err2, Err3' FROM dual UNION 
SELECT 109,'test2','Err1'             FROM dual;
Run Code Online (Sandbox Code Playgroud)

并查询:

SELECT *
FROM tab t
OUTER APPLY (SELECT TRIM(p) AS p
            FROM JSON_TABLE(REPLACE(JSON_ARRAY(t.Error), ',', '","'),
           '$[*]' COLUMNS (p VARCHAR2(4000) PATH '$'))) s;
Run Code Online (Sandbox Code Playgroud)

输出:

????????????????????????????????????????????
? Name ? Project ?      Error       ?  P   ?
????????????????????????????????????????????
?  108 ? test    ? Err1, Err2, Err3 ? Err1 ?
?  108 ? test    ? Err1, Err2, Err3 ? Err2 ?
?  108 ? test    ? Err1, Err2, Err3 ? Err3 ?
?  109 ? test2   ? Err1             ? Err1 ?
????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)

db<>小提琴演示