Rust的ToLowercase的动机是什么?

Ron*_*ith 8 rust

Rust char有一个to_lowercase函数,似乎返回结构ToLowercase,它似乎是一个迭代器,总是有一个元素.

不会char直接返回更自然和简单吗?

DK.*_*DK. 23

不会直接返回一个字符更自然和简单?

自然,简单,错误. Unicode过于复杂,无法工作.最根本的问题是,char 是不是足以总是代表一个单一的,逻辑上完整的"性格",对"性格"的一些定义.

这似乎是一个总是有一个元素的迭代器.

通过运行一个简单的程序来证明这是错误的,该程序是大小写的每个有效的Unicode代码点.该程序:

/*!
Add the following to a `Cargo.toml` file:

```cargo
[dependencies]
arrayvec = "0.3.15"
```
*/
extern crate arrayvec;
use arrayvec::ArrayVec;

fn main() {
    let mut expanded_lcs = 0;
    let mut expanded_ucs = 0;

    let usvs = (0..0xd7ff).chain(0xe000..0x10ffff)
        .flat_map(|v| std::char::from_u32(v).into_iter());

    for c in usvs {
        let lc: ArrayVec<[_; 4]> = c.to_lowercase().collect();
        let uc: ArrayVec<[_; 4]> = c.to_uppercase().collect();

        if lc.len() != 1 {
            expanded_lcs += 1;
            print!("'{}' U+{:04X} L -> ", c, c as u32);
            for c in lc {
                print!("'{}' U+{:04X} ", c, c as u32);
            }
            println!("");
        }

        if uc.len() != 1 {
            expanded_ucs += 1;
            print!("'{}' U+{:04X} U -> ", c, c as u32);
            for c in uc {
                print!("'{}' U+{:04X} ", c, c as u32);
            }
            println!("");
        }
    }

    println!("\n-----\n");

    println!("Found {} chars with expanded lowercase conversions.", expanded_lcs);
    println!("Found {} chars with expanded uppercase conversions.", expanded_ucs);
}
Run Code Online (Sandbox Code Playgroud)

它的输出,rustc每晚1.8:

'ß' U+00DF U -> 'S' U+0053 'S' U+0053 
'?' U+0130 L -> 'i' U+0069 '?' U+0307 
'?' U+0149 U -> '?' U+02BC 'N' U+004E 
'?' U+01F0 U -> 'J' U+004A '?' U+030C 
'?' U+0390 U -> '?' U+0399 '?' U+0308 '?' U+0301 
'?' U+03B0 U -> '?' U+03A5 '?' U+0308 '?' U+0301 
'?' U+0587 U -> '?' U+0535 '?' U+0552 
'?' U+1E96 U -> 'H' U+0048 '?' U+0331 
'?' U+1E97 U -> 'T' U+0054 '?' U+0308 
'?' U+1E98 U -> 'W' U+0057 '?' U+030A 
'?' U+1E99 U -> 'Y' U+0059 '?' U+030A 
'?' U+1E9A U -> 'A' U+0041 '?' U+02BE 
'?' U+1F50 U -> '?' U+03A5 '?' U+0313 
'?' U+1F52 U -> '?' U+03A5 '?' U+0313 '?' U+0300 
'?' U+1F54 U -> '?' U+03A5 '?' U+0313 '?' U+0301 
'?' U+1F56 U -> '?' U+03A5 '?' U+0313 '?' U+0342 
'?' U+1F80 U -> '?' U+1F08 '?' U+0399 
'?' U+1F81 U -> '?' U+1F09 '?' U+0399 
'?' U+1F82 U -> '?' U+1F0A '?' U+0399 
'?' U+1F83 U -> '?' U+1F0B '?' U+0399 
'?' U+1F84 U -> '?' U+1F0C '?' U+0399 
'?' U+1F85 U -> '?' U+1F0D '?' U+0399 
'?' U+1F86 U -> '?' U+1F0E '?' U+0399 
'?' U+1F87 U -> '?' U+1F0F '?' U+0399 
'?' U+1F88 U -> '?' U+1F08 '?' U+0399 
'?' U+1F89 U -> '?' U+1F09 '?' U+0399 
'?' U+1F8A U -> '?' U+1F0A '?' U+0399 
'?' U+1F8B U -> '?' U+1F0B '?' U+0399 
'?' U+1F8C U -> '?' U+1F0C '?' U+0399 
'?' U+1F8D U -> '?' U+1F0D '?' U+0399 
'?' U+1F8E U -> '?' U+1F0E '?' U+0399 
'?' U+1F8F U -> '?' U+1F0F '?' U+0399 
'?' U+1F90 U -> '?' U+1F28 '?' U+0399 
'?' U+1F91 U -> '?' U+1F29 '?' U+0399 
'?' U+1F92 U -> '?' U+1F2A '?' U+0399 
'?' U+1F93 U -> '?' U+1F2B '?' U+0399 
'?' U+1F94 U -> '?' U+1F2C '?' U+0399 
'?' U+1F95 U -> '?' U+1F2D '?' U+0399 
'?' U+1F96 U -> '?' U+1F2E '?' U+0399 
'?' U+1F97 U -> '?' U+1F2F '?' U+0399 
'?' U+1F98 U -> '?' U+1F28 '?' U+0399 
'?' U+1F99 U -> '?' U+1F29 '?' U+0399 
'?' U+1F9A U -> '?' U+1F2A '?' U+0399 
'?' U+1F9B U -> '?' U+1F2B '?' U+0399 
'?' U+1F9C U -> '?' U+1F2C '?' U+0399 
'?' U+1F9D U -> '?' U+1F2D '?' U+0399 
'?' U+1F9E U -> '?' U+1F2E '?' U+0399 
'?' U+1F9F U -> '?' U+1F2F '?' U+0399 
'?' U+1FA0 U -> '?' U+1F68 '?' U+0399 
'?' U+1FA1 U -> '?' U+1F69 '?' U+0399 
'?' U+1FA2 U -> '?' U+1F6A '?' U+0399 
'?' U+1FA3 U -> '?' U+1F6B '?' U+0399 
'?' U+1FA4 U -> '?' U+1F6C '?' U+0399 
'?' U+1FA5 U -> '?' U+1F6D '?' U+0399 
'?' U+1FA6 U -> '?' U+1F6E '?' U+0399 
'?' U+1FA7 U -> '?' U+1F6F '?' U+0399 
'?' U+1FA8 U -> '?' U+1F68 '?' U+0399 
'?' U+1FA9 U -> '?' U+1F69 '?' U+0399 
'?' U+1FAA U -> '?' U+1F6A '?' U+0399 
'?' U+1FAB U -> '?' U+1F6B '?' U+0399 
'?' U+1FAC U -> '?' U+1F6C '?' U+0399 
'?' U+1FAD U -> '?' U+1F6D '?' U+0399 
'?' U+1FAE U -> '?' U+1F6E '?' U+0399 
'?' U+1FAF U -> '?' U+1F6F '?' U+0399 
'?' U+1FB2 U -> '?' U+1FBA '?' U+0399 
'?' U+1FB3 U -> '?' U+0391 '?' U+0399 
'?' U+1FB4 U -> '?' U+0386 '?' U+0399 
'?' U+1FB6 U -> '?' U+0391 '?' U+0342 
'?' U+1FB7 U -> '?' U+0391 '?' U+0342 '?' U+0399 
'?' U+1FBC U -> '?' U+0391 '?' U+0399 
'?' U+1FC2 U -> '?' U+1FCA '?' U+0399 
'?' U+1FC3 U -> '?' U+0397 '?' U+0399 
'?' U+1FC4 U -> '?' U+0389 '?' U+0399 
'?' U+1FC6 U -> '?' U+0397 '?' U+0342 
'?' U+1FC7 U -> '?' U+0397 '?' U+0342 '?' U+0399 
'?' U+1FCC U -> '?' U+0397 '?' U+0399 
'?' U+1FD2 U -> '?' U+0399 '?' U+0308 '?' U+0300 
'?' U+1FD3 U -> '?' U+0399 '?' U+0308 '?' U+0301 
'?' U+1FD6 U -> '?' U+0399 '?' U+0342 
'?' U+1FD7 U -> '?' U+0399 '?' U+0308 '?' U+0342 
'?' U+1FE2 U -> '?' U+03A5 '?' U+0308 '?' U+0300 
'?' U+1FE3 U -> '?' U+03A5 '?' U+0308 '?' U+0301 
'?' U+1FE4 U -> '?' U+03A1 '?' U+0313 
'?' U+1FE6 U -> '?' U+03A5 '?' U+0342 
'?' U+1FE7 U -> '?' U+03A5 '?' U+0308 '?' U+0342 
'?' U+1FF2 U -> '?' U+1FFA '?' U+0399 
'?' U+1FF3 U -> '?' U+03A9 '?' U+0399 
'?' U+1FF4 U -> '?' U+038F '?' U+0399 
'?' U+1FF6 U -> '?' U+03A9 '?' U+0342 
'?' U+1FF7 U -> '?' U+03A9 '?' U+0342 '?' U+0399 
'?' U+1FFC U -> '?' U+03A9 '?' U+0399 
'?' U+FB00 U -> 'F' U+0046 'F' U+0046 
'?' U+FB01 U -> 'F' U+0046 'I' U+0049 
'?' U+FB02 U -> 'F' U+0046 'L' U+004C 
'?' U+FB03 U -> 'F' U+0046 'F' U+0046 'I' U+0049 
'?' U+FB04 U -> 'F' U+0046 'F' U+0046 'L' U+004C 
'?' U+FB05 U -> 'S' U+0053 'T' U+0054 
'?' U+FB06 U -> 'S' U+0053 'T' U+0054 
'?' U+FB13 U -> '?' U+0544 '?' U+0546 
'?' U+FB14 U -> '?' U+0544 '?' U+0535 
'?' U+FB15 U -> '?' U+0544 '?' U+053B 
'?' U+FB16 U -> '?' U+054E '?' U+0546 
'?' U+FB17 U -> '?' U+0544 '?' U+053D 

-----

Found 1 chars with expanded lowercase conversions.
Found 102 chars with expanded uppercase conversions.
Run Code Online (Sandbox Code Playgroud)

请注意,这不会考虑区域设置,这可能会更改输出.


awe*_*oon 5

这似乎是一个总是一个元素的迭代器。

不总是。在某些情况下,单个字符代表一个小写符号,而大写符号则由两个字符代表。

SpecialCasingUnicode 文档中涵盖的那些情况。引用 Rust 文档:

这会执行复杂的无条件映射而无需定制:它根据 Unicode 数据库和附加的复杂映射 SpecialCasing.txt 将一个 Unicode 字符映射到其对应的小写字母。