$ Python 字节正则表达式中的 Windows 换行符

Question

$ Python 字节正则表达式中的 Windows 换行符

$匹配行尾，行尾定义为字符串末尾或后跟换行符的任何位置。

但是，Windows换行标志包含两个字符'\r\n'，如何使其'$'识别'\r\n'为换行符bytes？

这是我所拥有的：

# Python 3.4.2
import re

input = b'''
//today is a good day \r\n
//this is Windows newline style \r\n
//unix line style \n
...other binary data... 
'''

L = re.findall(rb'//.*?$', input, flags = re.DOTALL | re.MULTILINE)
for item in L : print(item)

Run Code Online (Sandbox Code Playgroud)

现在的输出是：

b'//today is a good day \r'
b'//this is Windows newline style \r'
b'//unix line style '

Run Code Online (Sandbox Code Playgroud)

但预期输出如下：

the expected output:
b'//today is a good day '
b'//this is Windows newline style '
b'//unix line style '

Run Code Online (Sandbox Code Playgroud)

Answer 1

Wik*_*żew 3

无法重新定义锚点行为。

要将 a//与其后除 CR 和 LF 之外的任意数量的字符匹配，请使用[^\r\n]带*量词的否定字符类：

L = re.findall(rb'//[^\r\n]*', input)

Run Code Online (Sandbox Code Playgroud)

请注意，此方法不需要使用re.M和re.S标志。

或者，您可以\r?在 a 之前$添加并将这部分包含在积极的前瞻中（此外，您将成为*?带有的惰性量词.）：

rb'//.*?(?=\r?$)'

Run Code Online (Sandbox Code Playgroud)

使用前瞻的要点在于，$它本身就是一种前瞻，因为它并不真正消耗字符\n。因此，我们可以安全地将其放入可选的前瞻中\r。

也许这不是那么相关，因为它来自MSDN，但我认为这对于 Python 来说是一样的：

请注意，$匹配\n但不匹配\r\n（回车符和换行符的组合，或CR/LF）。要匹配CR/LF字符组合，请包含\r?$在正则表达式模式中。

在 PCRE 中，您可以使用(*ANYCRLF)、(*CR) 和 (*ANY)来覆盖 $ 锚点的默认行为，但在 Python 中则不行。

归档时间：	10 年，5 月前
查看次数：	1888 次
最近记录：	9 年，3 月前