源字符串是:
# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'
Run Code Online (Sandbox Code Playgroud)
这是我的模式:
pattern = r'-?[0-9]+(\\.[0-9]*)?|-?\\.[0-9]+'
Run Code Online (Sandbox Code Playgroud)
但是,re.search可以给我正确的结果:
m = re.search(pattern, s)
print(m) # output: <_sre.SRE_Match object; span=(3, 6), match='123'>
Run Code Online (Sandbox Code Playgroud)
re.findall 只是转出一个空列表:
L = re.findall(pattern, s)
print(L) # output: ['', '', '']
Run Code Online (Sandbox Code Playgroud)
为什么不能re.findall给我预期的清单:
['123', '3.1415926']
Run Code Online (Sandbox Code Playgroud) $匹配行尾,行尾定义为字符串末尾或后跟换行符的任何位置。
但是,Windows换行标志包含两个字符'\r\n',如何使其'$'识别'\r\n'为换行符bytes?
这是我所拥有的:
# Python 3.4.2
import re
input = b'''
//today is a good day \r\n
//this is Windows newline style \r\n
//unix line style \n
...other binary data...
'''
L = re.findall(rb'//.*?$', input, flags = re.DOTALL | re.MULTILINE)
for item in L : print(item)
Run Code Online (Sandbox Code Playgroud)
现在的输出是:
b'//today is a good day \r'
b'//this is Windows newline style \r'
b'//unix line style '
Run Code Online (Sandbox Code Playgroud)
但预期输出如下:
the expected output:
b'//today is a good …Run Code Online (Sandbox Code Playgroud)