在正则表达式替换中捕获字符串

lur*_*ker 10 regex smalltalk pharo

从我从正则表达式的Pharo文档中收集的内容,我可以定义一个正则表达式对象,例如:

re := '(foo|re)bar' asRegex
Run Code Online (Sandbox Code Playgroud)

我可以用以下字符串替换匹配的正则表达式:

re copy: 'foobar blah rebar' replacingMatchesWith: 'meh'
Run Code Online (Sandbox Code Playgroud)

这将导致:''meh blah meh'.

到现在为止还挺好.但我想替换'bar'并单独留下前缀.因此,我需要一个变量来处理捕获的括号:

re copy: 'foobar blah rebar' replacingMatchesWith: '%1meh'
Run Code Online (Sandbox Code Playgroud)

我想要结果:'foomeh blah remeh'.但是,这只是给了我:'%1meh blah %1meh'.我也尝试使用\1,或\\1,或$1,或{1}找来文字字符串替换,例如,'\1meh blah \1meh'作为一个结果.

我可以在GNU Smalltalk中轻松完成这项工作:

'foobar blah rebar' replacingAllRegex: '(foo|re)bar' with: '%1meh'
Run Code Online (Sandbox Code Playgroud)

但我在Pharo正则表达式文档中找不到任何告诉我如何在Pharo中执行此操作的文档.我也为Pharo正则表达式做过一堆谷歌搜索,但没有发现任何东西.这个功能是RxMatcher类还是其他Pharo正则表达式类的一部分?

lur*_*ker 1

在对该类进行了一些实验后RxMatcher,我对选择器进行了以下修改RxMatcher#copyStream:to:replacingMatchesWith:

copyStream: aStream to: writeStream replacingMatchesWith: aString
    "Copy the contents of <aStream> on the <writeStream>,
     except for the matches. Replace each match with <aString>."

    | searchStart matchStart matchEnd |
    stream := aStream.
    markerPositions := nil.
    [searchStart := aStream position.
    self proceedSearchingStream: aStream] whileTrue: [ | ws rep |
        matchStart := (self subBeginning: 1) first.
        matchEnd := (self subEnd: 1) first.
        aStream position: searchStart.
        searchStart to: matchStart - 1 do:
            [:ignoredPos | writeStream nextPut: aStream next].

        "------- The following lines replaced: writeStream nextPutAll: aString ------"
        "Do the regex replacement including lookback substitutions"
        writeStream nextPutAll: (aString format: self subexpressionStrings).
        "-------"

        aStream position: matchEnd.
        "Be extra careful about successful matches which consume no input.
        After those, make sure to advance or finish if already at end."
        matchEnd = searchStart ifTrue: 
            [aStream atEnd
                ifTrue: [^self "rest after end of whileTrue: block is a no-op if atEnd"]
                ifFalse:    [writeStream nextPut: aStream next]]].
    aStream position: searchStart.
    [aStream atEnd] whileFalse: [writeStream nextPut: aStream next]
Run Code Online (Sandbox Code Playgroud)

然后是“访问”类别:

subexpressionStrings
   "Create an array of lookback strings"
   | ws |
   ws := Array new writeStream.
   2 to: (self subexpressionCount) do: [ :n | | se |
      ws nextPut: ((se := self subexpression: n) ifNil: [ '' ] ifNotNil: [ se ]) ].
   ^ws contents.
Run Code Online (Sandbox Code Playgroud)

通过此修改,我可以使用 Smalltalk 模式作为参数在替换字符串中进行回顾String#format:

re := '((foo|re)ba(r|m))' asRegex
re copy: 'foobar meh rebam' replacingMatchesWith: '{2}bu{3} (was {1})'
Run Code Online (Sandbox Code Playgroud)

结果是:

'foobur (was foobar) meh rebum (was rebam)'
Run Code Online (Sandbox Code Playgroud)