我正在寻找一种方法,在Swift 4中,测试一个Character是否是一个任意CharacterSet的成员.我有这个Scanner类将用于一些轻量级的解析.该类中的一个功能是跳过当前位置的属于某组可能字符的任何字符.
class MyScanner {
let str: String
var idx: String.Index
init(_ string: String) {
str = string
idx = str.startIndex
}
var remains: String { return String(str[idx..<str.endIndex])}
func skip(charactersIn characters: CharacterSet) {
while idx < str.endIndex && characters.contains(str[idx])) {
idx = source.index(idx, offsetBy: 1)
}
}
}
let scanner = MyScanner("fizz buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print("what remains: \"\(scanner.remains)\"")
Run Code Online (Sandbox Code Playgroud)
我想实现该skip(charactersIn:)功能,以便打印上面的代码buzz fizz.
最棘手的部分是characters.contains(str[idx]))在while- .contains()需要Unicode.Scalar,我不知所措,试图找出下一步骤.
我知道我可以在一传String的skip功能,但我想找到一种方法,使其与工作CharacterSet,因为所有的便利静态成员(的alphanumerics,whitespaces等等).
CharacterSet如果它包含一个,如何测试Character?
nat*_*han 13
不确定它是否是最有效的方法,但你可以创建一个新的CharSet并检查它们是否是子/超集(设置比较相当快)
let newSet = CharacterSet(charactersIn: "a")
// let newSet = CharacterSet(charactersIn: "\(character)")
print(newSet.isSubset(of: CharacterSet.decimalDigits)) // false
print(newSet.isSubset(of: CharacterSet.alphanumerics)) // true
Run Code Online (Sandbox Code Playgroud)
我知道你想用CharacterSet,而不是String,但CharacterSet确实不是由一个以上的(然而,至少)支持的字符Unicode.Scalar.请参阅"家庭"字符()或Apple在WWDC 2017视频中的字符串讨论中演示的国际标志字符(例如""或"")视频Swift中的新功能.多重肤色表情符号也表现出这种行为(例如vs).
因此,我会谨慎使用CharacterSet(这是一组"用于搜索操作的Unicode字符值").或者,如果您想为方便起见而提供此方法,请注意它对于由多个unicode标量表示的字符无法正常工作.
因此,您可能会提供一个扫描程序,它提供该方法的两者CharacterSet和String再现skip:
class MyScanner {
let string: String
var index: String.Index
init(_ string: String) {
self.string = string
index = string.startIndex
}
var remains: String { return String(string[index...]) }
/// Skip characters in a string
///
/// This rendition is safe to use with strings that have characters
/// represented by more than one unicode scalar.
///
/// - Parameter skipString: A string with all of the characters to skip.
func skip(charactersIn skipString: String) {
while index < string.endIndex, skipString.contains(string[index]) {
index = string.index(index, offsetBy: 1)
}
}
/// Skip characters in character set
///
/// Note, character sets cannot (yet) include characters that are represented by
/// more than one unicode scalar (e.g. ??? or or ). If you want to test
/// for these multi-unicode characters, you have to use the `String` rendition of
/// this method.
///
/// This will simply stop scanning if it encounters a multi-unicode character in
/// the string being scanned (because it knows the `CharacterSet` can only represent
/// single-unicode characters) and you want to avoid false positives (e.g., mistaking
/// the Jamaican flag, , for the Japanese flag, ).
///
/// - Parameter characterSet: The character set to check for membership.
func skip(charactersIn characterSet: CharacterSet) {
while index < string.endIndex,
string[index].unicodeScalars.count == 1,
let character = string[index].unicodeScalars.first,
characterSet.contains(character) {
index = string.index(index, offsetBy: 1)
}
}
}
Run Code Online (Sandbox Code Playgroud)
因此,您的简单示例仍然有效:
let scanner = MyScanner("fizz buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print(scanner.remains) // "buzz fizz"
Run Code Online (Sandbox Code Playgroud)
但是String如果要跳过的字符可能包含多个unicode标量,请使用再现:
let family = "\u{200D}\u{200D}\u{200D}" // ???
let boy = ""
let charactersToSkip = family + boy
let string = boy + family + "foobar" // ???foobar
let scanner = MyScanner(string)
scanner.skip(charactersIn: charactersToSkip)
print(scanner.remains) // foobar
Run Code Online (Sandbox Code Playgroud)
正如Michael Waterfall在下面的评论中指出的那样,CharacterSet有一个错误,甚至没有Unicode.Scalar正确处理32位值,这意味着如果值超过0xffff(包括表情符号等),它甚至不能正确处理单个标量字符.不过String,上面的表达正确处理了这些问题.
Swift 4.2
CharacterSet扩展函数来检查它是否包含Character:
extension CharacterSet {
func containsUnicodeScalars(of character: Character) -> Bool {
return character.unicodeScalars.allSatisfy(contains(_:))
}
}
Run Code Online (Sandbox Code Playgroud)
用法示例:
CharacterSet.decimalDigits.containsUnicodeScalars(of: "3") // true
CharacterSet.decimalDigits.containsUnicodeScalars(of: "a") // false
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8652 次 |
| 最近记录: |