来自NSCharacterSet的NSArray

Sar*_*ran 20 objective-c nscharacterset

目前我能够制作如下的字母数组

[[NSArray alloc]initWithObjects:@"A",@"B",@"C",@"D",@"E",@"F",@"G",@"H",@"I",@"J",@"K",@"L",@"M",@"N",@"O",@"P",@"Q",@"R",@"S",@"T",@"U",@"V",@"W",@"X",@"Y",@"Z",nil];
Run Code Online (Sandbox Code Playgroud)

知道可以结束

[NSCharacterSet uppercaseLetterCharacterSet]
Run Code Online (Sandbox Code Playgroud)

如何制作阵列?

Mar*_*n R 47

以下代码创建一个包含给定字符集的所有字符的数组.它也适用于"基本多语言平面"之外的字符(字符> U + FFFF,例如U + 10400 DESERET CAPITAL LONGTER LONG I).

NSCharacterSet *charset = [NSCharacterSet uppercaseLetterCharacterSet];
NSMutableArray *array = [NSMutableArray array];
for (int plane = 0; plane <= 16; plane++) {
    if ([charset hasMemberInPlane:plane]) {
        UTF32Char c;
        for (c = plane << 16; c < (plane+1) << 16; c++) {
            if ([charset longCharacterIsMember:c]) {
                UTF32Char c1 = OSSwapHostToLittleInt32(c); // To make it byte-order safe
                NSString *s = [[NSString alloc] initWithBytes:&c1 length:4 encoding:NSUTF32LittleEndianStringEncoding];
                [array addObject:s];
            }
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

因为uppercaseLetterCharacterSet这给出了1467个元素的数组.但请注意,字符> U + FFFF存储为UTF-16代理对NSString,因此例如U + 10400实际上存储NSString为2个字符"\ uD801\uDC00".

Swift 2代码可以在这个问题的其他答案中找到.这是一个Swift 3版本,作为扩展方法编写:

extension CharacterSet {
    func allCharacters() -> [Character] {
        var result: [Character] = []
        for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {
            for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {
                if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {
                    result.append(Character(uniChar))
                }
            }
        }
        return result
    }
}
Run Code Online (Sandbox Code Playgroud)

例:

let charset = CharacterSet.uppercaseLetters
let chars = charset.allCharacters()
print(chars.count) // 1521
print(chars) // ["A", "B", "C", ... "]
Run Code Online (Sandbox Code Playgroud)

(请注意,用于显示结果的字体中可能不存在某些字符.)


小智 10

由于字符具有有限的,有限的(而不是太宽)范围,因此您可以只测试哪些字符是给定字符集(暴力)的成员:

// this doesn't seem to be available
#define UNICHAR_MAX (1ull << (CHAR_BIT * sizeof(unichar)))

NSData *data = [[NSCharacterSet uppercaseLetterCharacterSet] bitmapRepresentation];
uint8_t *ptr = [data bytes];
NSMutableArray *allCharsInSet = [NSMutableArray array];
// following from Apple's sample code
for (unichar i = 0; i < UNICHAR_MAX; i++) {
    if (ptr[i >> 3] & (1u << (i & 7))) {
        [allCharsInSet addObject:[NSString stringWithCharacters:&i length:1]];
    }
}
Run Code Online (Sandbox Code Playgroud)

备注:由于unichar的大小和bitmapRepresentation中附加段的结构,此解决方案仅适用于字符<= 0xFFFF,不适用于更高的平面.

  • oooppppssssss.要理解这段代码,我们需要50K +的声誉.人们会被这段代码吓到. (5认同)

Cœu*_*œur 7

Satachito答案的启发,这是一种使用CharacterSet从CharacterSet制作数组的高效方法bitmapRepresentation

extension CharacterSet {
    func characters() -> [Character] {
        // A Unicode scalar is any Unicode code point in the range U+0000 to U+D7FF inclusive or U+E000 to U+10FFFF inclusive.
        return codePoints().compactMap { UnicodeScalar($0) }.map { Character($0) }
    }

    func codePoints() -> [Int] {
        var result: [Int] = []
        var plane = 0
        // following documentation at https://developer.apple.com/documentation/foundation/nscharacterset/1417719-bitmaprepresentation
        for (i, w) in bitmapRepresentation.enumerated() {
            let k = i % 8193
            if k == 8192 {
                // plane index byte
                plane = Int(w) << 13
                continue
            }
            let base = (plane + k) << 3
            for j in 0 ..< 8 where w & 1 << j != 0 {
                result.append(base + j)
            }
        }
        return result
    }
}
Run Code Online (Sandbox Code Playgroud)

大写字母示例

let charset = CharacterSet.uppercaseLetters
let chars = charset.characters()
print(chars.count) // 1733
print(chars) // ["A", "B", "C", ... "]
Run Code Online (Sandbox Code Playgroud)

不连续平面的示例

let charset = CharacterSet(charactersIn: "")
let codePoints = charset.codePoints()
print(codePoints) // [120488, 837521]
Run Code Online (Sandbox Code Playgroud)

表演节目

非常好:此版本内置的解决方案bitmapRepresentation似乎比Martin R的解决方案contains或Oliver Atkinson的解决方案快3至10倍longCharacterIsMember

  • @Cœur:我可以确认,在发布模式下编译的命令行项目中,您的函数比我的函数快得多,大约是“CharacterSet.letters.characters()”的两倍。 (2认同)