Ruby 1.9:如何才能正确地更新和缩写多字节字符串?

kch*_*kch 55 ruby unicode utf-8 internationalization multibyte

因此,matz决定保留upcasedowncase限制/[A-Z]/i在ruby 1.9.1中.

ActiveSupport::Multibyte长期以来在ruby 1.8.x中经历了很棒的i18n案例String#mb_chars.

但是,当在ruby 1.9.1下尝试时,它似乎不起作用.这是我写的一个简单的测试脚本,以及我得到的输出:

$ cat test.rb
# encoding: UTF-8

puts("@ #{RUBY_VERSION} " + (__ENCODING__ rescue $KCODE).to_s)
sd, su = "Iñtërnâtiônàlizætiøn", "IÑTËRNÂTIÔNÀLIZÆTIØN"
def ps(u, d, k); puts "%-30s:  %24s / %-24s" % [k, u, d] end
ps sd.upcase, su.downcase, "Plain ruby"

require 'rubygems'; require 'active_support'
ps sd.upcase, su.downcase, "With active_support"
ps sd.mb_chars.upcase.to_s, su.mb_chars.downcase.to_s, "With active_support mb_chars"

$ ruby -KU test.rb
@ 1.8.7 UTF8
Plain ruby                    :  IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn
With active_support           :  IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn
With active_support mb_chars  :  IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiøn

$ ruby1.9 test.rb
@ 1.9.1 UTF-8
Plain ruby                    :      IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn
With active_support           :      IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn
With active_support mb_chars  :      IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn
Run Code Online (Sandbox Code Playgroud)

那么,我如何获得国际化upcasedowncaseruby 1.9.1?

更新

我要补充一点,我也从目前的ActiveSupport测试master,2-3-*3-0-unstable在GitHub上轨分支.结果相同.

des*_*tan 57

对于任何来自谷歌的人ruby upcase utf8:

> "your problem chars here çö??ü Iñtërnâtiônàlizætiøn".mb_chars.upcase.to_s
=> "YOUR PROBLEM CHARS HERE ÇÖ?IÜ IÑTËRNÂTIÔNÀLIZÆTIØN"
Run Code Online (Sandbox Code Playgroud)

解决方案是使用 mb_chars

  • 好吧,对不起德国人,但是为土耳其语字母工作:D (11认同)
  • 不,这不是解决方案,因为案例转换依赖于语言环境.例如,在德国语言环境中,''ß'`将胜过''SS',而不是美国语言环境. (3认同)
  • 只是指向ActiveSupport :: Multibyte :: Chars类的链接:http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html (2认同)

Mar*_*une 38

大小写转换依赖于区域设置,并不总是往返,这就是Ruby 1.9没有覆盖它的原因(参见此处此处)

Unicode的UTIL宝石应该满足您的需求.

  • 很酷,他们甚至使用臭名昭着的资本土耳其语我在README中用一个点,这举例说明了你提到的语言环境依赖性. (2认同)

J-_*_*_-L 12

大小写转换是复杂的,并且与语言环境有关.幸运的是,MartinDürst 在Ruby 2.4中添加了完整的Unicode案例映射:

puts RUBY_DESCRIPTION

sd, su = "Iñtërnâtiônàlizætiøn", "IÑTËRNÂTIÔNÀLIZÆTIØN"
def ps(u, d, k); puts "%-30s:  %24s / %-24s" % [k, u, d] end 
ps sd.upcase,              su.downcase,              "Ruby 2.4 (default)"
ps sd.upcase(:ascii),      su.downcase(:ascii),      "Ruby 2.4 (ascii)"
ps sd.upcase(:turkic),     su.downcase(:turkic),     "Ruby 2.4 (turkic)"
ps sd.upcase(:lithuanian), su.downcase(:lithuanian), "Ruby 2.4 (lithuanian)"
ps "-",                    su.downcase(:fold),       "Ruby 2.4 (fold)"
Run Code Online (Sandbox Code Playgroud)

输出:

ruby 2.4.0dev (2016-06-24 trunk 55499) [x86_64-linux]
Ruby 2.4 (default)            :      IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiøn
Ruby 2.4 (ascii)              :      IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn
Ruby 2.4 (turkic)             :      IÑTËRNÂT?ÔNÀL?ZÆT?ØN / ?ñtërnât?ônàl?zæt?øn
Ruby 2.4 (lithuanian)         :      IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiøn
Ruby 2.4 (fold)               :                         - / iñtërnâtiônàlizætiøn
Run Code Online (Sandbox Code Playgroud)