测试一个CharacterSet是否在Swift 4中包含一个字符的最佳方法是什么?

g-m*_*ark 13 swift swift4

我正在寻找一种方法,在Swift 4中,测试一个Character是否是一个任意CharacterSet的成员.我有这个Scanner类将用于一些轻量级的解析.该类中的一个功能是跳过当前位置的属于某组可能字符的任何字符.

class MyScanner {
  let str: String
  var idx: String.Index
  init(_ string: String) {
    str = string
    idx = str.startIndex
  }
  var remains: String { return String(str[idx..<str.endIndex])}

  func skip(charactersIn characters: CharacterSet) {
    while idx < str.endIndex && characters.contains(str[idx])) {
      idx = source.index(idx, offsetBy: 1)
    }
  }
}

let scanner = MyScanner("fizz   buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print("what remains: \"\(scanner.remains)\"")
Run Code Online (Sandbox Code Playgroud)

我想实现该skip(charactersIn:)功能,以便打印上面的代码buzz fizz.

最棘手的部分是characters.contains(str[idx]))while- .contains()需要Unicode.Scalar,我不知所措,试图找出下一步骤.

我知道我可以在一传Stringskip功能,但我想找到一种方法,使其与工作CharacterSet,因为所有的便利静态成员(的alphanumerics,whitespaces等等).

CharacterSet如果它包含一个,如何测试Character

nat*_*han 13

不确定它是否是最有效的方法,但你可以创建一个新的CharSet并检查它们是否是子/超集(设置比较相当快)

let newSet = CharacterSet(charactersIn: "a")
// let newSet = CharacterSet(charactersIn: "\(character)")
print(newSet.isSubset(of: CharacterSet.decimalDigits)) // false
print(newSet.isSubset(of: CharacterSet.alphanumerics)) // true
Run Code Online (Sandbox Code Playgroud)


Rob*_*Rob 6

我知道你想用CharacterSet,而不是String,但CharacterSet确实不是由一个以上的(然而,至少)支持的字符Unicode.Scalar.请参阅"家庭"字符()或Apple在WWDC 2017视频中的字符串讨论中演示的国际标志字符(例如""或"")视频Swift中的新功能.多重肤色表情符号也表现出这种行为(例如vs).

因此,我会谨慎使用CharacterSet(这是一组"用于搜索操作的Unicode字符值").或者,如果您想为方便起见而提供此方法,请注意它对于由多个unicode标量表示的字符无法正常工作.

因此,您可能会提供一个扫描程序,它提供该方法的两者CharacterSetString再现skip:

class MyScanner {
    let string: String
    var index: String.Index

    init(_ string: String) {
        self.string = string
        index = string.startIndex
    }

    var remains: String { return String(string[index...]) }

    /// Skip characters in a string
    ///
    /// This rendition is safe to use with strings that have characters
    /// represented by more than one unicode scalar.
    ///
    /// - Parameter skipString: A string with all of the characters to skip.

    func skip(charactersIn skipString: String) {
        while index < string.endIndex, skipString.contains(string[index]) {
            index = string.index(index, offsetBy: 1)
        }
    }

    /// Skip characters in character set
    ///
    /// Note, character sets cannot (yet) include characters that are represented by
    /// more than one unicode scalar (e.g. ??? or  or ). If you want to test
    /// for these multi-unicode characters, you have to use the `String` rendition of
    /// this method.
    ///
    /// This will simply stop scanning if it encounters a multi-unicode character in
    /// the string being scanned (because it knows the `CharacterSet` can only represent
    /// single-unicode characters) and you want to avoid false positives (e.g., mistaking
    /// the Jamaican flag, , for the Japanese flag, ).
    ///
    /// - Parameter characterSet: The character set to check for membership.

    func skip(charactersIn characterSet: CharacterSet) {
        while index < string.endIndex,
            string[index].unicodeScalars.count == 1,
            let character = string[index].unicodeScalars.first,
            characterSet.contains(character) {
                index = string.index(index, offsetBy: 1)
        }
    }

}
Run Code Online (Sandbox Code Playgroud)

因此,您的简单示例仍然有效:

let scanner = MyScanner("fizz   buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print(scanner.remains)  // "buzz fizz"
Run Code Online (Sandbox Code Playgroud)

但是String如果要跳过的字符可能包含多个unicode标量,请使用再现:

let family = "\u{200D}\u{200D}\u{200D}"  // ???
let boy = ""

let charactersToSkip = family + boy

let string = boy + family + "foobar"  // ???foobar

let scanner = MyScanner(string)
scanner.skip(charactersIn: charactersToSkip)
print(scanner.remains)                // foobar
Run Code Online (Sandbox Code Playgroud)

正如Michael Waterfall在下面的评论中指出的那样,CharacterSet有一个错误,甚至没有Unicode.Scalar正确处理32位值,这意味着如果值超过0xffff(包括表情符号等),它甚至不能正确处理单个标量字符.不过String,上面的表达正确处理了这些问题.

  • 有趣的是`CharacterSet`甚至不处理用单个unicode标量(= 128518)表示的表情符号,但这会返回`false`:`CharacterSet(charactersIn:"ABC").contains(UnicodeScalar(128518)!)` (2认同)

Vad*_*rov 6

Swift 4.2 CharacterSet扩展函数来检查它是否包含Character

extension CharacterSet {
    func containsUnicodeScalars(of character: Character) -> Bool {
        return character.unicodeScalars.allSatisfy(contains(_:))
    }
}
Run Code Online (Sandbox Code Playgroud)

用法示例:

CharacterSet.decimalDigits.containsUnicodeScalars(of: "3") // true
CharacterSet.decimalDigits.containsUnicodeScalars(of: "a") // false
Run Code Online (Sandbox Code Playgroud)