通常,当我替换换行符时,我会跳转到 Regexp,就像在这个 PHP 中一样
preg_replace('/\R/u', "\n", $String);
Run Code Online (Sandbox Code Playgroud)
因为我知道这是一种非常耐用的方法来替换任何类型的 Unicode 换行符(无论是 \n、\r、\r\n 等)
我也试图在 Go 中做这样的事情,但我得到了
解析正则表达式时出错:转义序列无效:
\R
在这条线上
msg = regexp.MustCompilePOSIX("\\R").ReplaceAllString(html.EscapeString(msg), "<br>\n")
Run Code Online (Sandbox Code Playgroud)
我尝试(?:(?>\r\n)|\v)从/sf/answers/307242001/使用,但看起来 Go 的正则表达式实现也不支持,恐慌invalid or unsupported Perl syntax: '(?>'
在 Go、Regex 中替换换行符的好、安全方法是什么?
我在这里看到这个答案Golang: Issues replace newlines in a string from a text file say to use \r?\n,但我不敢相信它会得到所有的Unicode 换行符,主要是因为这个问题的答案列出了更多的换行符代码点\r?\n涵盖的 3 个,
While using regexp usually yields an elegant and compact solution, often it's not the fastest.
For tasks where you have to replace certain substrings with others, the standard library provides a really efficient solution in the form of strings.Replacer:
Replacer replaces a list of strings with replacements. It is safe for concurrent use by multiple goroutines.
You may create a reusable replacer with strings.NewReplacer(), where you list the pairs containing the replaceable parts and their replacements. When you want to perform a replacing, you simply call Replacer.Replace().
Here's how it would look like:
const replacement = "<br>\n"
var replacer = strings.NewReplacer(
"\r\n", replacement,
"\r", replacement,
"\n", replacement,
"\v", replacement,
"\f", replacement,
"\u0085", replacement,
"\u2028", replacement,
"\u2029", replacement,
)
func replaceReplacer(s string) string {
return replacer.Replace(s)
}
Run Code Online (Sandbox Code Playgroud)
Here's how the regexp solution from Wiktor's answer looks like:
var re = regexp.MustCompile(`\r\n|[\r\n\v\f\x{0085}\x{2028}\x{2029}]`)
func replaceRegexp(s string) string {
return re.ReplaceAllString(s, "<br>\n")
}
Run Code Online (Sandbox Code Playgroud)
The implementation is actually quite fast. Here's a simple benchmark comparing it to the above pre-compiled regexp solution:
const input = "1st\nsecond\r\nthird\r4th\u0085fifth\u2028sixth"
func BenchmarkReplacer(b *testing.B) {
for i := 0; i < b.N; i++ {
replaceReplacer(input)
}
}
func BenchmarkRegexp(b *testing.B) {
for i := 0; i < b.N; i++ {
replaceRegexp(input)
}
}
Run Code Online (Sandbox Code Playgroud)
And the benchmark results:
BenchmarkReplacer-4 3000000 495 ns/op
BenchmarkRegexp-4 500000 2787 ns/op
Run Code Online (Sandbox Code Playgroud)
For our test input, strings.Replacer was more than 5 times faster.
There's also another advantage. In the example above we obtain the result as a new string value (in both solutions). This requires a new string allocation. If we need to write the result to an io.Writer (e.g. we're creating an HTTP response or writing the result to a file), we can avoid having to create the new string in case of strings.Replacer as it has a handy Replacer.WriteString() method which takes an io.Writer and writes the result into it without allocating and returning it as a string. This further significantly increases the performance gain compared to the regexp solution.
| 归档时间: |
|
| 查看次数: |
5089 次 |
| 最近记录: |