为什么NSRegularExpression在所有情况下都不尊重捕获组?

JD.*_*JD. 3 regex objective-c nsregularexpression

主要问题:当我的模式是,时@"\\b(\\S+)\\b",ObjC可以告诉我有六个匹配,但是当我的模式是@"A b (c) or (d)",它只报告一个匹配,"c".

这是一个将捕获组作为NSArray返回的函数.我是一个Objective C新手,所以我怀疑有更好的方法来做笨重的工作,而不是通过创建一个可变数组并在最后将它分配给NSArray.

- (NSArray *)regexWithResults:(NSString *)haystack pattern:(NSString *)strPattern
{
    NSArray *ar;
    ar = [[NSArray alloc] init];
    NSError *error = NULL;
    NSArray *arTextCheckingResults;
    NSMutableArray *arMutable = [[NSMutableArray alloc] init];
    NSRegularExpression *regex = [NSRegularExpression
        regularExpressionWithPattern:strPattern
        options:NSRegularExpressionSearch error:&error];

    arTextCheckingResults = [regex matchesInString:haystack
        options:0
        range:NSMakeRange(0, [haystack length])];

    for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
        int captureIndex;
        for (captureIndex = 1; captureIndex < ntcr.numberOfRanges; captureIndex++) {
            NSString * capture = [haystack substringWithRange:[ntcr rangeAtIndex:captureIndex]];
            //NSLog(@"Found '%@'", capture);
            [arMutable addObject:capture];
        }
    }

    ar = arMutable;
    return ar;
}
Run Code Online (Sandbox Code Playgroud)

问题

我习惯使用括号来匹配Perl中的捕获组,方式如下:

#!/usr/bin/perl -w
use strict;

my $str = "This sentence has words in it.";
if(my ($what, $inner) = ($str =~ /This (\S+) has (\S+) in it/)) {
    print "That $what had '$inner' in it.\n";
}
Run Code Online (Sandbox Code Playgroud)

该代码将产生:

    That sentence had 'words' in it.

但是在Objective C中,使用NSRegularExpression,我们得到了不同的结果.示例功能:

- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
    NSError *error = NULL;
    NSArray *arTextCheckingResults;

    NSRegularExpression *regex = [NSRegularExpression
                                  regularExpressionWithPattern:strPattern
                                  options:NSRegularExpressionSearch
                                  error:&error];

    NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];

    NSLog(@"Pattern: '%@'", strPattern);
    NSLog(@"Search text: '%@'", haystack);
    NSLog(@"Number of matches: %lu", numberOfMatches);

    arTextCheckingResults = [regex matchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];

    for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
        NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];
        NSLog(@"Found string '%@'", match);
    }
}
Run Code Online (Sandbox Code Playgroud)

调用该测试函数,结果显示它能够计算字符串中的单词数:

NSString *searchText = @"This sentence has words in it.";
[myClass regexTest:searchText pattern:@"\\b(\\S+)\\b"];
Run Code Online (Sandbox Code Playgroud)
    Pattern: '\b(\S+)\b'
    Search text: 'This sentence has words in it.'
    Number of matches: 6
    Found string 'This'
    Found string 'sentence'
    Found string 'has'
    Found string 'words'
    Found string 'in'
    Found string 'it'

但是如果捕获组是明确的,那会是什么呢?

[myClass regexTest:searchText pattern:@".*This (sentence) has (words) in it.*"];
Run Code Online (Sandbox Code Playgroud)

结果:

    Pattern: '.*This (sentence) has (words) in it.*'
    Search text: 'This sentence has words in it.'
    Number of matches: 1
    Found string 'sentence'

与上面相同,但使用\ S +而不是实际的单词:

[myClass regexTest:searchText pattern:@".*This (\\S+) has (\\S+) in it.*"];
Run Code Online (Sandbox Code Playgroud)

结果:

    Pattern: '.*This (\S+) has (\S+) in it.*'
    Search text: 'This sentence has words in it.'
    Number of matches: 1
    Found string 'sentence'

中间的通配符怎么样?

[myClass regexTest:searchText pattern:@"^This (\\S+) .* (\\S+) in it.$"];
Run Code Online (Sandbox Code Playgroud)

结果:

    Pattern: '^This (\S+) .* (\S+) in it.$'
    Search text: 'This sentence has words in it.'
    Number of matches: 1
    Found string 'sentence'

参考: NSRegularExpression NSTextCheckingResult NSRegularExpression匹配选项

Dan*_*man 7

我想如果你改变了

// returns the range which matched the pattern
NSString *match = [haystack substringWithRange:ntcr.range];
Run Code Online (Sandbox Code Playgroud)

// returns the range of the first capture
NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];
Run Code Online (Sandbox Code Playgroud)

对于包含单个捕获的模式,您将获得预期结果.

请参阅NSTextCheckingResult的文档页面:rangeAtIndex:

结果必须至少有一个范围,但可以选择包含更多范围(例如,表示正则表达式捕获组).

传递rangeAtIndex:值0始终返回range属性的值.其他范围(如果有)将具有从1到numberOfRanges-1的索引.