Swift提取正则表达式匹配

mit*_*man 157 regex string ios swift

我想从匹配正则表达式模式的字符串中提取子字符串.

所以我正在寻找这样的东西:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {
   ???
}
Run Code Online (Sandbox Code Playgroud)

所以这就是我所拥有的:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {

    var regex = NSRegularExpression(pattern: regex, 
        options: nil, error: nil)

    var results = regex.matchesInString(text, 
        options: nil, range: NSMakeRange(0, countElements(text))) 
            as Array<NSTextCheckingResult>

    /// ???

    return ...
}
Run Code Online (Sandbox Code Playgroud)

问题是,这matchesInString提供了我的一个数组NSTextCheckingResult,其中NSTextCheckingResult.range的类型的NSRange.

NSRange是不相容的Range<String.Index>,所以它阻止我使用text.substringWithRange(...)

有没有想过如何在没有太多代码的情况下在swift中实现这个简单的事情?

Mar*_*n R 293

即使该matchesInString()方法采用a String作为第一个参数,它在内部工作NSString,并且range参数必须使用NSString长度而不是Swift字符串长度.否则它将失败"扩展的字形集群",如"标志".

Swift 4(Xcode 9)开始,Swift标准库提供了在Range<String.Index> 和之间进行转换的功能NSRange.

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return results.map {
            String(text[Range($0.range, in: text)!])
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}
Run Code Online (Sandbox Code Playgroud)

例:

let string = "€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]
Run Code Online (Sandbox Code Playgroud)

注意:强制解包Range($0.range, in: text)!是安全的,因为它NSRange引用给定字符串的子字符串text.但是,如果你想避免它,那么使用

        return results.flatMap {
            Range($0.range, in: text).map { String(text[$0]) }
        }
Run Code Online (Sandbox Code Playgroud)

代替.


(Swift 3及更早版本的旧答案:)

因此,您应该将给定的Swift字符串转换为a NSString然后提取范围.结果将自动转换为Swift字符串数组.

(可以在编辑历史中找到Swift 1.2的代码.)

Swift 2(Xcode 7.3.1):

func matchesForRegexInText(regex: String, text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text,
                                            options: [], range: NSMakeRange(0, nsString.length))
        return results.map { nsString.substringWithRange($0.range)}
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}
Run Code Online (Sandbox Code Playgroud)

例:

let string = "€4€9"
let matches = matchesForRegexInText("[0-9]", text: string)
print(matches)
// ["4", "9"]
Run Code Online (Sandbox Code Playgroud)

Swift 3(Xcode 8)

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range)}
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}
Run Code Online (Sandbox Code Playgroud)

例:

let string = "€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]
Run Code Online (Sandbox Code Playgroud)

  • 你让我免于疯狂.不开玩笑.非常感谢! (9认同)

Lar*_*erg 56

我的答案建立在给定答案之上,但通过添加额外支持使正则表达式匹配更加健壮:

  • 返回不仅匹配,还返回每个匹配的所有捕获组(请参阅下面的示例)
  • 此解决方案支持可选匹配,而不是返回空数组
  • do/catch通过不打印到控制台并使用该构造来避免guard
  • 添加matchingStrings扩展名String

Swift 4.2

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.range(at: $0).location != NSNotFound
                    ? nsString.substring(with: result.range(at: $0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")
Run Code Online (Sandbox Code Playgroud)

斯威夫特3

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAt($0).location != NSNotFound
                    ? nsString.substring(with: result.rangeAt($0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")
Run Code Online (Sandbox Code Playgroud)

斯威夫特2

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matchesInString(self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAtIndex($0).location != NSNotFound
                    ? nsString.substringWithRange(result.rangeAtIndex($0))
                    : ""
            }
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

  • 我已将unittests添加到您的好片段中,https://gist.github.com/neoneye/03cbb26778539ba5eb609d16200e4522 (3认同)
  • 正要根据@MartinR 的回答写我自己的,直到我看到这个。谢谢! (2认同)

Ken*_*ler 25

在 Swift 5 中返回所有匹配项和捕获组的最快方法

extension String {
    func match(_ regex: String) -> [[String]] {
        let nsString = self as NSString
        return (try? NSRegularExpression(pattern: regex, options: []))?.matches(in: self, options: [], range: NSMakeRange(0, nsString.length)).map { match in
            (0..<match.numberOfRanges).map { match.range(at: $0).location == NSNotFound ? "" : nsString.substring(with: match.range(at: $0)) }
        } ?? []
    }
}
Run Code Online (Sandbox Code Playgroud)

返回一个二维字符串数组:

"prefix12suffix fix1su".match("fix([0-9]+)su")
Run Code Online (Sandbox Code Playgroud)

返回...

[["fix12su", "12"], ["fix1su", "1"]]

// First element of sub-array is the match
// All subsequent elements are the capture groups
Run Code Online (Sandbox Code Playgroud)


Pra*_*tti 13

iOS 16 更新:Regex\ RegexBuilderxe2\x80\x8d\xe2\x99\x80\xef\xb8\x8f

\n

Xcode 之前支持 Apple 的 Regex NSRegularExpression。Swift API 非常冗长,并且很难正确使用,因此 Apple今年发布了Regex Literal支持。RegexBuilder类型使用的正则表达式风格Regex与 相同NSRegularExpression,即 ICU Unicode 规范。

\n

API 已得到简化,以整理StringiOS 16 / macOS 13 中复杂的基于范围的解析逻辑并提高性能。

\n

使用文字的另一个优点是,如果我们使用无效的 RegEx 语法,我们会收到编译时错误:Cannot parse regular expression...带有 RegEx 错误的清晰描述。享受!

\n

Swift 5.7 中的正则表达式文字

\n
func parseLine(_ line: Substring) throws -> MailmapEntry {\n\n    let regex = /\\h*([^<#]+?)??\\h*<([^>#]+)>\\h*(?:#|\\Z)/\n\n    guard let match = line.prefixMatch(of: regex) else {\n        throw MailmapError.badLine\n    }\n\n    return MailmapEntry(name: match.1, email: match.2)\n}\n
Run Code Online (Sandbox Code Playgroud)\n

我们可以使用以下方式进行匹配:

\n
    \n
  1. firstMatch(of:):返回此集合中正则表达式的第一个匹配项,其中正则表达式是由给定闭包(RegEx 文字)创建的。

    \n
  2. \n
  3. prefixMatch(of:):如果该字符串与其开头的给定正则表达式匹配,则返回匹配项。

    \n
  4. \n
  5. wholeMatch(of:):匹配整个正则表达式,其中正则表达式是由给定的闭包(RegEx 文字)创建的。

    \n
  6. \n
  7. matches(of:):返回一个包含由给定闭包(RegEx 文字)创建的正则表达式的所有非重叠匹配项的集合。

    \n
  8. \n
\n

我已链接到上面的文档。新的 RegEx 文字语法具有多个新的 API,例如trimmingPrefix()contains()例如等,因此我鼓励进一步探索文档以获取更细致的用例。

\n

上述方法有等效的语法,我们调用prefixMatch(in:)正则表达式本身并传入要搜索的字符串。我更喜欢上面的语法,但请选择您喜欢的语法。

\n

示例代码:

\n
let aOrB = /[ab]+/\n\nif let stringMatch = try aOrB.firstMatch(in: "The year is 2022; last year was 2021.") {\n    print(stringMatch.0)\n} else {\n    print("No match.")\n}\n// prints "a"\n
Run Code Online (Sandbox Code Playgroud)\n

Swift 5.7 中的 RegexBuilder

\n

RegexBuilder 是 Apple 发布的一个新 API,旨在使 RegEx 代码更容易用 Swift 编写。如果我们想要更高的可读性,我们可以使用 RegexBuilder 将上面的 Regex 文字转换/\\h*([^<#]+?)??\\h*<([^>#]+)>\\h*(?:#|\\Z)/为更具声明性的形式。

\n

请注意,如果我们想平衡可读性和简洁性,我们可以在 RegexBuilder 中使用原始字符串,也可以在构建器中交错 Regex Literals。

\n
import RegexBuilder\n\nlet regex = Regex {\n    ZeroOrMore(.horizontalWhitespace)\n    Optionally {\n        Capture(OneOrMore(.noneOf("<#")))\n    }\n        .repetitionBehavior(.reluctant)\n    ZeroOrMore(.horizontalWhitespace)\n    "<"\n    Capture(OneOrMore(.noneOf(">#")))\n    ">"\n    ZeroOrMore(.horizontalWhitespace)\n    /#|\\Z/\n}\n
Run Code Online (Sandbox Code Playgroud)\n

RegEx 文字/#|\\Z/相当于:

\n
ChoiceOf {\n   "#"\n   Anchor.endOfSubjectBeforeNewline\n}\n
Run Code Online (Sandbox Code Playgroud)\n

可组合的RegexComponent

\n

RegexBuilder在可组合性方面,语法也类似于 SwiftUI,因为我们可以RegexComponent在其他RegexComponents 中重用 s:

\n
struct MailmapLine: RegexComponent {\n    @RegexComponentBuilder\n    var regex: Regex<(Substring, Substring?, Substring)> {\n        ZeroOrMore(.horizontalWhitespace)\n        Optionally {\n            Capture(OneOrMore(.noneOf("<#")))\n        }\n            .repetitionBehavior(.reluctant)\n        ZeroOrMore(.horizontalWhitespace)\n        "<"\n        Capture(OneOrMore(.noneOf(">#")))\n        ">"\n        ZeroOrMore(.horizontalWhitespace)\n        ChoiceOf {\n           "#"\n            Anchor.endOfSubjectBeforeNewline\n        }\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n

来源:部分代码取自 WWDC 2022 视频“Swift 中的新增功能”。

\n


Mik*_*ico 12

如果要从String中提取子字符串,而不仅仅是位置,(但实际的字符串包括emojis).然后,以下可能是一个更简单的解决方案.

extension String {
  func regex (pattern: String) -> [String] {
    do {
      let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions(rawValue: 0))
      let nsstr = self as NSString
      let all = NSRange(location: 0, length: nsstr.length)
      var matches : [String] = [String]()
      regex.enumerateMatchesInString(self, options: NSMatchingOptions(rawValue: 0), range: all) {
        (result : NSTextCheckingResult?, _, _) in
        if let r = result {
          let result = nsstr.substringWithRange(r.range) as String
          matches.append(result)
        }
      }
      return matches
    } catch {
      return [String]()
    }
  }
} 
Run Code Online (Sandbox Code Playgroud)

用法示例:

"someText ?? pig".regex("??")
Run Code Online (Sandbox Code Playgroud)

将返回以下内容:

["??"]
Run Code Online (Sandbox Code Playgroud)

注意使用"\ w +"可能会产生意外的""

"someText ?? pig".regex("\\w+")
Run Code Online (Sandbox Code Playgroud)

将返回此String数组

["someText", "?", "pig"]
Run Code Online (Sandbox Code Playgroud)


Rob*_*ham 9

我发现接受的答案的解决方案很遗憾不能在Swift 3 for Linux上编译.这是一个修改后的版本,它确实:

import Foundation

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex, options: [])
        let nsString = NSString(string: text)
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}
Run Code Online (Sandbox Code Playgroud)

主要区别是:

  1. Linux上的Swift似乎需要删除NS没有Swift原生等效的Foundation对象的前缀.(参见Swift evolution proposal#86.)

  2. Linux上的Swift还需要optionsRegularExpression初始化和matches方法指定参数.

  3. 出于某种原因,强制String转换为a NSString在Linux上的Swift中不起作用,但是NSString使用a String作为源来初始化新的确有效.

此版本也适用于macOS/Xcode上的Swift 3,唯一的例外是您必须使用名称NSRegularExpression而不是RegularExpression.


shi*_*ami 6

没有 NSString 的 Swift 4。

extension String {
    func matches(regex: String) -> [String] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: [.caseInsensitive]) else { return [] }
        let matches  = regex.matches(in: self, options: [], range: NSMakeRange(0, self.count))
        return matches.map { match in
            return String(self[Range(match.range, in: self)!])
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

  • 请注意上述解决方案:“NSMakeRange(0, self.count)”不正确,因为“self”是“String”(=UTF8),而不是“NSString”(=UTF16)。因此“self.count”不一定与“nsString.length”相同(如其他解决方案中使用的)。您可以将范围计算替换为“NSRange(self.startIndex..., in: self)” (5认同)

Oli*_*erD 5

@ p4bloch如果你想从一系列捕获括号中捕获结果,那么你需要使用rangeAtIndex(index)方法NSTextCheckingResult而不是range.这是@MartinR从上面开始的Swift2方法,适用于捕获括号.在返回的数组中,第一个结果[0]是整个捕获,然后从各个捕获组开始[1].我注释掉了map操作(因此更容易看到我改变了什么)并用嵌套循环替换它.

func matches(for regex: String!, in text: String!) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text, options: [], range: NSMakeRange(0, nsString.length))
        var match = [String]()
        for result in results {
            for i in 0..<result.numberOfRanges {
                match.append(nsString.substringWithRange( result.rangeAtIndex(i) ))
            }
        }
        return match
        //return results.map { nsString.substringWithRange( $0.range )} //rangeAtIndex(0)
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}
Run Code Online (Sandbox Code Playgroud)

一个示例用例可能是,例如,您要分割一个title year例如"Finding Dory 2016" 的字符串,您可以这样做:

print ( matches(for: "^(.+)\\s(\\d{4})" , in: "Finding Dory 2016"))
// ["Finding Dory 2016", "Finding Dory", "2016"]
Run Code Online (Sandbox Code Playgroud)