Zna*_*kus 3 html regex anonymize
我正在尝试使用正则表达式使HTML字符串匿名化,以进行SQL查询。
https://regex101.com/r/QWt1E1/1
(?<!\<)[^<>\s](?!\>)
Run Code Online (Sandbox Code Playgroud)
<p><em>Hi [User</em></p>
<p><em>Tack för visat intresse.</em></p>
<p><em>Good luck!</em><em> </em></p>
<p><em>Sincerely</em></p>
Run Code Online (Sandbox Code Playgroud)
<p><em>nn nnnnn</nm></p>
<p><em>nnnn nnnnnnnn nnnnn nnnnnnnnn</nm></p>
<p><em>nnnn nnnnn</nm><em>nnnnnn</nm></p>
<p><em>nnnnnnnnn</nm></p>
Run Code Online (Sandbox Code Playgroud)
计划是用<代替所有不在<>内的字符n。它几乎可以工作,但是在我的示例中,它代替了ein </em>。不知道为什么以及如何解决。
如何调整正则表达式以不替换e示例中的?
Negative lookahead for [^<>]*> instead of just >, to ensure that the current position is not followed by a > before any other angle brackets (because that would indicate you're currently inside a tag).
This also means that you can drop the lookbehind:
[^<>\s](?![^<>]*>)
^^^^^^
Run Code Online (Sandbox Code Playgroud)
https://regex101.com/r/QWt1E1/3
Still, it would be better to parse the HTML using an HTML parser, if at all possible
| 归档时间: |
|
| 查看次数: |
49 次 |
| 最近记录: |