Golang 替换任何和所有换行符

Bri*_*ica 2 regex string go

通常,当我替换换行符时,我会跳转到 Regexp,就像在这个 PHP 中一样

preg_replace('/\R/u', "\n", $String);
Run Code Online (Sandbox Code Playgroud)

因为我知道这是一种非常耐用的方法来替换任何类型的 Unicode 换行符(无论是 \n、\r、\r\n 等)

我也试图在 Go 中做这样的事情,但我得到了

解析正则表达式时出错:转义序列无效: \R

在这条线上

msg = regexp.MustCompilePOSIX("\\R").ReplaceAllString(html.EscapeString(msg), "<br>\n")
Run Code Online (Sandbox Code Playgroud)

我尝试(?:(?>\r\n)|\v)/sf/answers/307242001/使用,但看起来 Go 的正则表达式实现也不支持,恐慌invalid or unsupported Perl syntax: '(?>'

在 Go、Regex 中替换换行符的好、安全方法是什么?


我在这里看到这个答案Golang: Issues replace newlines in a string from a text file say to use \r?\n,但我不敢相信它会得到所有的Unicode 换行符,主要是因为这个问题的答案列出了更多的换行符代码点\r?\n涵盖的 3 个,

icz*_*cza 5

While using regexp usually yields an elegant and compact solution, often it's not the fastest.

For tasks where you have to replace certain substrings with others, the standard library provides a really efficient solution in the form of strings.Replacer:

Replacer replaces a list of strings with replacements. It is safe for concurrent use by multiple goroutines.

You may create a reusable replacer with strings.NewReplacer(), where you list the pairs containing the replaceable parts and their replacements. When you want to perform a replacing, you simply call Replacer.Replace().

Here's how it would look like:

const replacement = "<br>\n"

var replacer = strings.NewReplacer(
    "\r\n", replacement,
    "\r", replacement,
    "\n", replacement,
    "\v", replacement,
    "\f", replacement,
    "\u0085", replacement,
    "\u2028", replacement,
    "\u2029", replacement,
)

func replaceReplacer(s string) string {
    return replacer.Replace(s)
}
Run Code Online (Sandbox Code Playgroud)

Here's how the regexp solution from Wiktor's answer looks like:

var re = regexp.MustCompile(`\r\n|[\r\n\v\f\x{0085}\x{2028}\x{2029}]`)

func replaceRegexp(s string) string {
    return re.ReplaceAllString(s, "<br>\n")
}
Run Code Online (Sandbox Code Playgroud)

The implementation is actually quite fast. Here's a simple benchmark comparing it to the above pre-compiled regexp solution:

const input = "1st\nsecond\r\nthird\r4th\u0085fifth\u2028sixth"

func BenchmarkReplacer(b *testing.B) {
    for i := 0; i < b.N; i++ {
        replaceReplacer(input)
    }
}

func BenchmarkRegexp(b *testing.B) {
    for i := 0; i < b.N; i++ {
        replaceRegexp(input)
    }
}
Run Code Online (Sandbox Code Playgroud)

And the benchmark results:

BenchmarkReplacer-4      3000000               495 ns/op
BenchmarkRegexp-4         500000              2787 ns/op
Run Code Online (Sandbox Code Playgroud)

For our test input, strings.Replacer was more than 5 times faster.

There's also another advantage. In the example above we obtain the result as a new string value (in both solutions). This requires a new string allocation. If we need to write the result to an io.Writer (e.g. we're creating an HTTP response or writing the result to a file), we can avoid having to create the new string in case of strings.Replacer as it has a handy Replacer.WriteString() method which takes an io.Writer and writes the result into it without allocating and returning it as a string. This further significantly increases the performance gain compared to the regexp solution.