Pet*_*ing 6 regex sql oracle regexp-substr regexp-replace
我有一个数据库表(Oracle 11g)的问卷反馈,包括多项选择,多个答案问题.Options列具有用户可以选择的每个值,Answers列具有他们选择的数值.
ID_NO OPTIONS ANSWERS
1001 Apple Pie|Banana-Split|Cream Tea 1|2
1002 Apple Pie|Banana-Split|Cream Tea 2|3
1003 Apple Pie|Banana-Split|Cream Tea 1|2|3
Run Code Online (Sandbox Code Playgroud)
我需要一个能够解码答案的查询,并将答案的文本版本作为单个字符串.
ID_NO ANSWERS ANSWER_DECODE
1001 1|2 Apple Pie|Banana-Split
1002 2|3 Banana-Split|Cream Tea
1003 1|2|3 Apple Pie|Banana-Split|Cream Tea
Run Code Online (Sandbox Code Playgroud)
我已经尝试使用正则表达式来替换值并获得子串,但我无法找到一种方法来正确合并这两者.
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
id_no,
options,
REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, 2) second_option,
answers,
REGEXP_REPLACE(answers, '(\d)+', ' \1 ') answer_numbers,
REGEXP_REPLACE(answers, '(\d)+', REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, To_Number('2'))) "???"
FROM feedback
Run Code Online (Sandbox Code Playgroud)
我不想手动定义或解码SQL中的答案; 有很多调查都有不同的问题(以及不同数量的选项),所以我希望有一个解决方案可以动态地为所有问题工作.
我试图通过LEVEL将选项和答案拆分成单独的行,并在代码匹配的地方重新加入它们,但实际数据集运行得非常慢(带有600行响应的5选项问题).
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
answer_rows.id_no,
ListAgg(option_rows.answer) WITHIN GROUP(ORDER BY option_rows.lvl)
FROM
(SELECT DISTINCT
LEVEL lvl,
REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, LEVEL) answer
FROM
(SELECT DISTINCT
options,
REGEXP_COUNT(options||'|', '(.)+?\|') num_choices
FROM
feedback)
CONNECT BY LEVEL <= num_choices
) option_rows
LEFT OUTER JOIN
(SELECT DISTINCT
id_no,
to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) answer
FROM
(SELECT DISTINCT
id_no,
answers,
To_Number(REGEXP_SUBSTR(answers, '(\d)+$')) max_answer
FROM
feedback)
WHERE
to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) IS NOT NULL
CONNECT BY LEVEL <= max_answer
) answer_rows
ON option_rows.lvl = answer_rows.answer
GROUP BY
answer_rows.id_no
ORDER BY
answer_rows.id_no
Run Code Online (Sandbox Code Playgroud)
如果没有使用正则表达式的解决方案,是否有比LEVEL更有效的方法来分割值?或者还有另一种方法可行吗?
它很慢,因为你将每一行展开太多次;您使用的 connect-by 子句正在查找所有行,因此您最终会得到大量数据然后进行排序 - 这可能就是您最终在DISTINCT那里得到的原因。
您可以PRIOR向连接方式添加两个子句,首先是为了ID_NO保留,第二个是为了避免循环 - 任何非确定性函数都可以做到这一点,我已经选择了,dbms_random.value但如果您愿意,也可以使用sys_guid,或者其他。你也不需要很多子查询,你可以用两个来完成;或者作为 CTE,我认为它更清楚一些:
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL
),
option_rows AS (
SELECT
id_no,
LEVEL answer,
REGEXP_SUBSTR(options, '[^|]+', 1, LEVEL) answer_text
FROM feedback
CONNECT BY LEVEL <= REGEXP_COUNT(options, '[^|]+')
AND id_no = PRIOR id_no
AND PRIOR dbms_random.value IS NOT NULL
),
answer_rows AS (
SELECT
id_no,
REGEXP_SUBSTR(answers, '[^|]+', 1, LEVEL) answer
FROM feedback
CONNECT BY LEVEL <= REGEXP_COUNT(answers, '[^|]+')
AND PRIOR id_no = id_no
AND PRIOR dbms_random.value IS NOT NULL
)
SELECT
option_rows.id_no,
LISTAGG(option_rows.answer, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answers,
LISTAGG(option_rows.answer_text, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answer_decode
FROM option_rows
JOIN answer_rows
ON option_rows.id_no = answer_rows.id_no
AND option_rows.answer = answer_rows.answer
GROUP BY option_rows.id_no
ORDER BY option_rows.id_no;
Run Code Online (Sandbox Code Playgroud)
得到:
ID_NO ANSWERS ANSWER_DECODE
---------- ---------- ----------------------------------------
1001 1|2 Apple Pie|Banana-Split
1002 2|3 Banana-Split|Cream Tea
1003 1|2|3 Apple Pie|Banana-Split|Cream Tea
Run Code Online (Sandbox Code Playgroud)
我还更改了您的正则表达式模式,因此您不必附加或删除|.
| 归档时间: |
|
| 查看次数: |
357 次 |
| 最近记录: |