mar*_*law 98 sql string oracle plsql tokenize
我知道这已经在某种程度上得到了PHP和MYSQL的回答,但我想知道是否有人可以教我最简单的方法将字符串(逗号分隔)拆分为Oracle 10g(最好)和11g中的多行.
表格如下:
Name | Project | Error
108 test Err1, Err2, Err3
109 test2 Err1
Run Code Online (Sandbox Code Playgroud)
我想创建以下内容:
Name | Project | Error
108 Test Err1
108 Test Err2
108 Test Err3
109 Test2 Err1
Run Code Online (Sandbox Code Playgroud)
我已经看到了一些围绕堆栈的潜在解决方案,但是它们只占了一个列(以逗号分隔的字符串).任何帮助将不胜感激.
Nef*_*reo 109
使用大型数据集时,接受的答案性能较差.
这可能是一种改进的方式(也可以使用regexp和connect by):
with temp as
(
select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual
union all
select 109, 'test2', 'Err1' from dual
)
select distinct
t.name, t.project,
trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) as error
from
temp t,
table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels
order by name
Run Code Online (Sandbox Code Playgroud)
编辑:这是一个简单的(如在"非深入")查询的解释.
length (regexp_replace(t.error, '[^,]+')) + 1用于regexp_replace擦除任何不是分隔符的东西(在本例中为逗号)并length +1获取有多少元素(错误).在select level from dual connect by level <= (...)使用分层查询到与越来越多的比赛中,从1到错误的总数创建列.
预习:
select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1 as max
from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1
Run Code Online (Sandbox Code Playgroud)table(cast(multiset(.....) as sys.OdciNumberList)) 做一些oracle类型的演员.
cast(multiset(.....)) as sys.OdciNumberList变换多个集合(一个收集在原始数据集中的每一行)转换成数字的单个集合,OdciNumberList.table()函数将集合转换为结果集.FROM没有连接会在数据集和多集之间创建交叉连接.结果,具有4个匹配的数据集中的行将重复4次(在名为"column_value"的列中具有增加的数字).
预习:
select * from
temp t,
table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels
Run Code Online (Sandbox Code Playgroud)trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))使用column_valueas作为nth_appearance/ocurrence参数regexp_substr.t.name, t.project作为示例)以便于可视化.一些对Oracle文档的引用:
Lal*_*r B 28
以下两者之间存在巨大差异:
如果不限制行,则CONNECT BY子句将生成多行,并且不会提供所需的输出.
除正则表达式外,还有一些其他选择:
建立
SQL> CREATE TABLE t (
2 ID NUMBER GENERATED ALWAYS AS IDENTITY,
3 text VARCHAR2(100)
4 );
Table created.
SQL>
SQL> INSERT INTO t (text) VALUES ('word1, word2, word3');
1 row created.
SQL> INSERT INTO t (text) VALUES ('word4, word5, word6');
1 row created.
SQL> INSERT INTO t (text) VALUES ('word7, word8, word9');
1 row created.
SQL> COMMIT;
Commit complete.
SQL>
SQL> SELECT * FROM t;
ID TEXT
---------- ----------------------------------------------
1 word1, word2, word3
2 word4, word5, word6
3 word7, word8, word9
SQL>
Run Code Online (Sandbox Code Playgroud)
使用XMLTABLE:
SQL> SELECT id,
2 trim(COLUMN_VALUE) text
3 FROM t,
4 xmltable(('"'
5 || REPLACE(text, ',', '","')
6 || '"'))
7 /
ID TEXT
---------- ------------------------
1 word1
1 word2
1 word3
2 word4
2 word5
2 word6
3 word7
3 word8
3 word9
9 rows selected.
SQL>
Run Code Online (Sandbox Code Playgroud)
使用MODEL子句:
SQL> WITH
2 model_param AS
3 (
4 SELECT id,
5 text AS orig_str ,
6 ','
7 || text
8 || ',' AS mod_str ,
9 1 AS start_pos ,
10 Length(text) AS end_pos ,
11 (Length(text) - Length(Replace(text, ','))) + 1 AS element_count ,
12 0 AS element_no ,
13 ROWNUM AS rn
14 FROM t )
15 SELECT id,
16 trim(Substr(mod_str, start_pos, end_pos-start_pos)) text
17 FROM (
18 SELECT *
19 FROM model_param MODEL PARTITION BY (id, rn, orig_str, mod_str)
20 DIMENSION BY (element_no)
21 MEASURES (start_pos, end_pos, element_count)
22 RULES ITERATE (2000)
23 UNTIL (ITERATION_NUMBER+1 = element_count[0])
24 ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1,
25 end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) )
26 )
27 WHERE element_no != 0
28 ORDER BY mod_str ,
29 element_no
30 /
ID TEXT
---------- --------------------------------------------------
1 word1
1 word2
1 word3
2 word4
2 word5
2 word6
3 word7
3 word8
3 word9
9 rows selected.
SQL>
Run Code Online (Sandbox Code Playgroud)
And*_*lev 27
正则表达式是一件很棒的事:)
with temp as (
select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual
union all
select 109, 'test2', 'Err1' from dual
)
SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str
FROM (SELECT Name, Project, Error str FROM temp) t
CONNECT BY instr(str, ',', 1, level - 1) > 0
order by Name
Run Code Online (Sandbox Code Playgroud)
还有几个相同的例子:
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1
/
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1
/
Run Code Online (Sandbox Code Playgroud)
此外,可以使用DBMS_UTILITY.comma_to_table&table_to_comma:http: //www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table
小智 6
我想提出一种使用PIPELINED表函数的不同方法.它有点类似于XMLTABLE的技术,除了你提供自己的自定义函数来分割字符串:
-- Create a collection type to hold the results
CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30);
/
-- Split the string according to the specified delimiter
CREATE OR REPLACE FUNCTION str2tbl (
p_string VARCHAR2,
p_delimiter CHAR DEFAULT ','
)
RETURN typ_str2tbl_nst PIPELINED
AS
l_tmp VARCHAR2(32000) := p_string || p_delimiter;
l_pos NUMBER;
BEGIN
LOOP
l_pos := INSTR( l_tmp, p_delimiter );
EXIT WHEN NVL( l_pos, 0 ) = 0;
PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) );
l_tmp := SUBSTR( l_tmp, l_pos+1 );
END LOOP;
END str2tbl;
/
-- The problem solution
SELECT name,
project,
TRIM(COLUMN_VALUE) error
FROM t, TABLE(str2tbl(error));
Run Code Online (Sandbox Code Playgroud)
结果:
NAME PROJECT ERROR
---------- ---------- --------------------
108 test Err1
108 test Err2
108 test Err3
109 test2 Err1
Run Code Online (Sandbox Code Playgroud)
这种方法的问题在于,优化器通常不会知道表函数的基数,因此必须进行猜测.这可能会对您的执行计划产生潜在的危害,因此可以扩展此解决方案以为优化程序提供执行统计信息.
您可以通过在上面的查询上运行EXPLAIN PLAN来查看此优化程序估计值:
Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16336 | 366K| 59 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 16336 | 366K| 59 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 8168 | 16336 | 28 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)
即使集合只有3个值,优化器也会估计8168行(默认值).这看起来似乎无关紧要,但优化者可能已经足够决定次优计划.
解决方案是使用优化程序扩展来为集合提供统计信息:
-- Create the optimizer interface to the str2tbl function
CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT (
dummy NUMBER,
STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
RETURN NUMBER,
STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo,
p_stats OUT SYS.ODCITabFuncStats,
p_args IN SYS.ODCIArgDescList,
p_string IN VARCHAR2,
p_delimiter IN CHAR DEFAULT ',' )
RETURN NUMBER
);
/
-- Optimizer interface implementation
CREATE OR REPLACE TYPE BODY typ_str2tbl_stats
AS
STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
RETURN NUMBER
AS
BEGIN
p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') );
RETURN ODCIConst.SUCCESS;
END ODCIGetInterfaces;
-- This function is responsible for returning the cardinality estimate
STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo,
p_stats OUT SYS.ODCITabFuncStats,
p_args IN SYS.ODCIArgDescList,
p_string IN VARCHAR2,
p_delimiter IN CHAR DEFAULT ',' )
RETURN NUMBER
AS
BEGIN
-- I'm using basically half the string lenght as an estimator for its cardinality
p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) );
RETURN ODCIConst.SUCCESS;
END ODCIStatsTableFunction;
END;
/
-- Associate our optimizer extension with the PIPELINED function
ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats;
Run Code Online (Sandbox Code Playgroud)
测试生成的执行计划:
Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 23 | 59 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 23 | 59 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 1 | 2 | 28 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,上面计划中的基数不再是8196猜测的值.它仍然不正确,因为我们正在将一列而不是字符串文字传递给该函数.
在这种特殊情况下,需要对函数代码进行一些调整以进行更接近的估计,但我认为整体概念在这里有很多解释.
这个答案中使用的str2tbl函数最初由Tom Kyte开发:https://asktom.oracle.com/pls/asktom/f?p = 100:11:0 :::: P11_QUESTION_ID:110612348061
通过阅读本文可以进一步探索将统计与对象类型相关联的概念:http: //www.oracle-developer.net/display.php?id = 427
这里描述的技术适用于10g +.
从 Oracle 12c 开始,您可以使用JSON_TABLE和JSON_ARRAY:
CREATE TABLE tab(Name, Project, Error) AS
SELECT 108,'test' ,'Err1, Err2, Err3' FROM dual UNION
SELECT 109,'test2','Err1' FROM dual;
Run Code Online (Sandbox Code Playgroud)
并查询:
SELECT *
FROM tab t
OUTER APPLY (SELECT TRIM(p) AS p
FROM JSON_TABLE(REPLACE(JSON_ARRAY(t.Error), ',', '","'),
'$[*]' COLUMNS (p VARCHAR2(4000) PATH '$'))) s;
Run Code Online (Sandbox Code Playgroud)
输出:
????????????????????????????????????????????
? Name ? Project ? Error ? P ?
????????????????????????????????????????????
? 108 ? test ? Err1, Err2, Err3 ? Err1 ?
? 108 ? test ? Err1, Err2, Err3 ? Err2 ?
? 108 ? test ? Err1, Err2, Err3 ? Err3 ?
? 109 ? test2 ? Err1 ? Err1 ?
????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
232189 次 |
| 最近记录: |