数据源:
\n\n\xe6\xb5\x8b\xe8\xaf\x95demo\ndemo1\n\xe4\xb8\xad\xe6\x96\x872\nRun Code Online (Sandbox Code Playgroud)\n\n输出:
\n\ndemo\ndemo1\n2\nRun Code Online (Sandbox Code Playgroud)\n\n我尝试了select regexp_replace('\xe6\xb5\x8b\xe8\xaf\x95\xe4\xb8\xad\xe6\x96\x87demo','[\\u0391-\\uFFE5]','') from dual,但没有效果。并且\\w包括中文,所以不要使用[^\\w].
现在我能想到了 select regexp_replace('\xe6\xb5\x8b\xe8\xaf\x95\xe4\xb8\xad\xe6\x96\x87demo','[^a-zA-Z0-9\\s]','') from dual。
有更好的方法吗?
\n\n\n\n通常使用正则表达式,您可以使用
\n\\x或\\u后跟十六进制代码来搜索任何字符。例如\\x20将匹配空间。但REGEXP_LIKE在Oracle中不支持\\x。您需要使用unistr函数将代码转换为等效字符,然后将其与REGEXP_LIKE. 例如REGEXP_LIKE(source,\'[\' ||unistr(\'\\0020\')|| \']\');
所以,你需要类似的东西
\n\nselect regexp_replace(\'\xe6\xb5\x8b\xe8\xaf\x95\xe4\xb8\xad\xe6\x96\x87demo\', \'[\' || unistr(\'\\0391\') || \'-\' || unistr(\'\\9FA5\') || \']\',\'\') from dual\nRun Code Online (Sandbox Code Playgroud)\n\n注意:一个更好的中文正则表达式应该包含以下所有范围:
\n\n---------------------------------------------------------------------------------- \n|Block | ES6 Range | ES5 Range |\n|---------------------------------------------------------------------------------|\n|CJK Unified Ideographs | 4E00-9FFF | \\u4E00-\\u9FFF |\n|CJK Unified Ideographs Extension A | 3400-4DFF | \\u3400-\\u4DFF |\n|CJK Unified Ideographs Extension B | 20000-2A6DF | \\uD840\\uDC00-\\uD869\\uDEDF|\n|CJK Unified Ideographs Extension C | 2A700\xe2\x80\x932B73F | \\uD869\\uDF00-\\uD86D\\uDF3F|\n|CJK Unified Ideographs Extension D | 2B740\xe2\x80\x932B81F | \\uD86D\\uDF40-\\uD86E\\uDC1F|\n|CJK Unified Ideographs Extension E | 2B820\xe2\x80\x932CEAF | \\uD86E\\uDC20-\\uD873\\uDEAF|\n|CJK Compatibility Ideographs | F900-FAFF | \\uF900-\\uFAFF |\n|CJK Compatibility Ideographs Supplement | 2F800-2FA1F | \\uD87E\\uDC00-\\uD87E\\uDE1F|\n----------------------------------------------------------------------------------\nRun Code Online (Sandbox Code Playgroud)\n\n所以,尝试一下
\n\nselect regexp_replace(\'\xe6\xb5\x8b\xe8\xaf\x95\xe4\xb8\xad\xe6\x96\x87demo\',\'[\' || unistr(\'\\4E00\') || \'-\' || unistr(\'\\9FFF\') || unistr(\'\\3400\') || \'-\' || unistr(\'\\4DFF\') || unistr(\'\\D840\\DC00\') || \'-\' || unistr(\'\\D869\\DEDF\') || unistr(\'\\D869\\DF00\') || \'-\' || unistr(\'\\D86D\\DF3F\') || unistr(\'\\D86D\\DF40\') || \'-\' || unistr(\'\\D86E\\DC1F\') || unistr(\'\\D86E\\DC20\') || \'-\' || unistr(\'\\D873\\DEAF\') || unistr(\'\\F900\') || \'-\' || unistr(\'\\FAFF\') || unistr(\'\\D87E\\DC00\') || \'-\' || unistr(\'\\D87E\\DE1F\') || \']\',\'\') from dual