将文本拆分为数组,同时保持 Swift 中的标点符号

5 arrays string split ios swift

我想将文本拆分成一个数组,保留由其余单词分隔的标点符号,因此字符串如下:

Hello, I am Albert Einstein.
Run Code Online (Sandbox Code Playgroud)

应该变成这样的数组:

["Hello", ",", "I", "am", "Albert", "Einstein", "."]
Run Code Online (Sandbox Code Playgroud)

我已经尝试过,sting.components(separatedBy: CharacterSet.init(charactersIn: " ,;;:"))但是这个方法会删除所有标点符号,并返回一个这样的数组:

["Hello", "I", "am", "Albert", "Einstein"]
Run Code Online (Sandbox Code Playgroud)

那么,我怎样才能得到一个像我的第一个例子那样的数组呢?

Duy*_*Hoa 2

作为解决方案,它并不漂亮,但您可以尝试:

var str = "Hello, I am Albert Einstein."
var list = [String]()
var currentSubString = "";
//enumerate to get all characters including ".", ",", ";", " "
str.enumerateSubstrings(in: str.startIndex..<str.endIndex, options: String.EnumerationOptions.byComposedCharacterSequences) { (substring, substringRange, enclosingRange, value) in
    if let _subString = substring {
        if (!currentSubString.isEmpty &&
            (_subString.compare(" ") == .orderedSame
                || _subString.compare(",") == .orderedSame
                || _subString.compare(".") == .orderedSame
                || _subString.compare(";") == .orderedSame
            )
            ) {
            //create word if see any of those character and currentSubString is not empty
            list.append(currentSubString)
            currentSubString = _subString.trimmingCharacters(in: CharacterSet.whitespaces )
        } else {
            //add to current sub string if current character is not space.
            if (_subString.compare(" ") != .orderedSame) {
                currentSubString += _subString
            }
        }
    }
}


//last word
if (!currentSubString.isEmpty) {
    list.append(currentSubString)
}
Run Code Online (Sandbox Code Playgroud)

在 Swift3 中:

var str = "Hello, I am Albert Einstein."
var list = [String]()
var currentSubString = "";
//enumerate to get all characters including ".", ",", ";", " "
str.enumerateSubstrings(in: str.startIndex..<str.endIndex, options: String.EnumerationOptions.byComposedCharacterSequences) { (substring, substringRange, enclosingRange, value) in
    if let _subString = substring {
        if (!currentSubString.isEmpty &&
            (_subString.compare(" ") == .orderedSame
                || _subString.compare(",") == .orderedSame
                || _subString.compare(".") == .orderedSame
                || _subString.compare(";") == .orderedSame
            )
            ) {
            //create word if see any of those character and currentSubString is not empty
            list.append(currentSubString)
            currentSubString = _subString.trimmingCharacters(in: CharacterSet.whitespaces )
        } else {
            //add to current sub string if current character is not space.
            if (_subString.compare(" ") != .orderedSame) {
                currentSubString += _subString
            }
        }
    }
} 


//last word
if (!currentSubString.isEmpty) {
    list.append(currentSubString)
}
Run Code Online (Sandbox Code Playgroud)

这个想法是循环所有字符并同时创建单词。单词是一组不是,.或的连续字符;。因此,在循环中创建单词期间,如果我们看到这些字符之一,并且构造中的当前单词不为空,则我们完成当前单词。要根据您的输入分解步骤:

  1. get H(不是空格或其他终止符)-> currentSubString = "H"
  2. get e(不是空格或其他终止符)-> currentSubString = "He"
  3. get l(不是空格或其他终止符)-> currentSubString = "Hel"
  4. get l(不是空格或其他终止符)-> currentSubString = "Hell"
  5. get o(不是空格或其他终止符)-> currentSubString = "Hello"
  6. get .(是终端字符)
    • -> 由于 currentSubString 不为空,添加list并重新构建下一个单词,然后 list = ["Hello"]
    • -> currentSubString =“。” (我使用修剪的原因只是为了删除如果我得到这个字符。但对于其他终端字符,我们必须保留下一个单词。
  7. get (是空格字符)
    • -> 由于currentSubString不为空,添加list并重新构建 -> list = ["Hello", "."]
    • -> currentSubString = "" (已修剪)。... 等等。