And*_*own 3 ruby regex utf-8 character-encoding ruby-1.9
我有一个需要匹配一堆字符的正则表达式.代码没有问题是ruby 1.8.7,但在1.9中它会杀死它.我想它与编码有关,我已经做了很多谷歌搜索,所以也许有人可以启发我.
码:
# encoding: utf-8
non_latin_hashtag_chars = [
(0xA960..0xA97F).to_a, # Hangul Jamo Extended-A
(0xAC00..0xD7AF).to_a, # Hangul Syllables
(0xD7B0..0xD7FF).to_a # Hangul Jamo Extended-B
].flatten.pack('U*').freeze
e = /[a-z_#{non_latin_hashtag_chars}]/io
Run Code Online (Sandbox Code Playgroud)
错误:
~/Desktop: ruby regex_test.rb
regex_test.rb:13:in `<main>': too many multibyte code ranges are specified: /[a-z_??????????????????????????????????????????????????????????????????????????????......
Run Code Online (Sandbox Code Playgroud)
正如twehad指出的那样,有一个10K的限制在正则表达式.
无论如何,你应该在Regexp中使用unicode范围:
/[a-z_\uA960-\uA97F\uAC00-\uD7AF\uD7B0-\uD7FF]/io
Run Code Online (Sandbox Code Playgroud)
我不是韩语专家所以我不知道它是否相同,但如果你想匹配所有韩文字符,你应该使用该类代替:
/[a-z_\p{Hangul}]/io
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
216 次 |
| 最近记录: |