如何在oracle中只过滤中文(特殊字符集)?

zha*_*zhu 1 regex sql oracle

数据源:

\n\n
\xe6\xb5\x8b\xe8\xaf\x95demo\ndemo1\n\xe4\xb8\xad\xe6\x96\x872\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出:

\n\n
demo\ndemo1\n2\n
Run Code Online (Sandbox Code Playgroud)\n\n

我尝试了select regexp_replace('\xe6\xb5\x8b\xe8\xaf\x95\xe4\xb8\xad\xe6\x96\x87demo','[\\u0391-\\uFFE5]','') from dual,但没有效果。并且\\w包括中文,所以不要使用[^\\w].

\n\n

现在我能想到了 select regexp_replace('\xe6\xb5\x8b\xe8\xaf\x95\xe4\xb8\xad\xe6\x96\x87demo','[^a-zA-Z0-9\\s]','') from dual

\n\n

有更好的方法吗?

\n

Wik*_*żew 5

请参阅在 Oracle 表中搜索 Unicode 字符

\n\n
\n

通常使用正则表达式,您可以使用\\x\\u后跟十六进制代码来搜索任何字符。例如\\x20将匹配空间。但REGEXP_LIKE在Oracle中不支持\\x。您需要使用unistr函数将代码转换为等效字符,然后将其与REGEXP_LIKE. 例如REGEXP_LIKE(source,\'[\' ||unistr(\'\\0020\')|| \']\');

\n
\n\n

所以,你需要类似的东西

\n\n
select regexp_replace(\'\xe6\xb5\x8b\xe8\xaf\x95\xe4\xb8\xad\xe6\x96\x87demo\', \'[\' || unistr(\'\\0391\') || \'-\' || unistr(\'\\9FA5\') || \']\',\'\') from dual\n
Run Code Online (Sandbox Code Playgroud)\n\n

注意:一个更好的中文正则表达式应该包含以下所有范围:

\n\n
---------------------------------------------------------------------------------- \n|Block                                   | ES6 Range   |   ES5 Range              |\n|---------------------------------------------------------------------------------|\n|CJK Unified Ideographs                  | 4E00-9FFF   | \\u4E00-\\u9FFF            |\n|CJK Unified Ideographs Extension A      | 3400-4DFF   | \\u3400-\\u4DFF            |\n|CJK Unified Ideographs Extension B      | 20000-2A6DF | \\uD840\\uDC00-\\uD869\\uDEDF|\n|CJK Unified Ideographs Extension C      | 2A700\xe2\x80\x932B73F | \\uD869\\uDF00-\\uD86D\\uDF3F|\n|CJK Unified Ideographs Extension D      | 2B740\xe2\x80\x932B81F | \\uD86D\\uDF40-\\uD86E\\uDC1F|\n|CJK Unified Ideographs Extension E      | 2B820\xe2\x80\x932CEAF | \\uD86E\\uDC20-\\uD873\\uDEAF|\n|CJK Compatibility Ideographs            | F900-FAFF   | \\uF900-\\uFAFF            |\n|CJK Compatibility Ideographs Supplement | 2F800-2FA1F | \\uD87E\\uDC00-\\uD87E\\uDE1F|\n----------------------------------------------------------------------------------\n
Run Code Online (Sandbox Code Playgroud)\n\n

所以,尝试一下

\n\n

select regexp_replace(\'\xe6\xb5\x8b\xe8\xaf\x95\xe4\xb8\xad\xe6\x96\x87demo\',\'[\' || unistr(\'\\4E00\') || \'-\' || unistr(\'\\9FFF\') || unistr(\'\\3400\') || \'-\' || unistr(\'\\4DFF\') || unistr(\'\\D840\\DC00\') || \'-\' || unistr(\'\\D869\\DEDF\') || unistr(\'\\D869\\DF00\') || \'-\' || unistr(\'\\D86D\\DF3F\') || unistr(\'\\D86D\\DF40\') || \'-\' || unistr(\'\\D86E\\DC1F\') || unistr(\'\\D86E\\DC20\') || \'-\' || unistr(\'\\D873\\DEAF\') || unistr(\'\\F900\') || \'-\' || unistr(\'\\FAFF\') || unistr(\'\\D87E\\DC00\') || \'-\' || unistr(\'\\D87E\\DE1F\') || \']\',\'\') from dual

\n