具有多种模式的 Python Regex sub()

Dav*_*lfe 4 python regex

我想知道是否有任何方法可以组合模式re.sub()而不是使用倍数,如下所示:

import re
s1 = "Please check with the store to confirm holiday hours."
s2 = ''' Hours:
            Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm

Please check with the store to confirm holiday hours.'''

strip1 = re.sub(s1, '', s2)
strip2 = re.sub('\t', '', strip1)
print(strip2)
Run Code Online (Sandbox Code Playgroud)

期望的输出:

Hours:
Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm
Run Code Online (Sandbox Code Playgroud)

Sha*_*ger 7

如果您只是想删除特定的子字符串,则可以将模式与交替组合以进行一次删除:

pat1 = r"Please check with the store to confirm holiday hours."
pat2 = r'\t'
combined_pat = r'|'.join((pat1, pat2))
stripped = re.sub(combined_pat, '', s2)
Run Code Online (Sandbox Code Playgroud)

如果“模式”使用实际的正则表达式特殊字符(因为那么您需要担心包装它们以确保在正确的位置交替中断),则更复杂,但对于简单的固定模式,这很简单。

如果您有真正的正则表达式,而不是固定模式,您可能会执行以下操作:

all_pats = [...]
combined_pat = r'|'.join(map(r'(?:{})'.format, all_pats))
Run Code Online (Sandbox Code Playgroud)

所以任何正则表达式特价都保持分组,而不会在交替中“流血”。

  • `r'\t'` 和 `'\t'` 碰巧工作相同。后者正在寻找代表制表符的文字字节,前者正在寻找正则表达式模式 `\t`,碰巧它寻找的是制表符。这是相同的最终结果。我只是关于使用原始字符串的强迫症;`r'\n'` 和 `r'\t'` 在原始或非原始模式下都可以正常工作,但是如果您搜索的是 `'\b'` 而不是 `r'\b'`(例如),您将重新寻找 ASCII 退格键,而不是单词边界,而您几乎从不想要前者。 (2认同)

Jac*_*ack 5

你甚至没有使用正则表达式,所以你也可以直接链接replace

s1 = "Please check with the store to confirm holiday hours."
s2 = ''' Hours:
            Monday: 9:30am - 6:00pm
Tuesday: 9:30am - 6:00pm
Wednesday: 9:30am - 6:00pm
Thursday: 9:30am - 6:00pm
Friday: 9:30am - 9:00pm
Saturday: 9:30am - 6:00pm
Sunday: 11:00am - 6:00pm

Please check with the store to confirm holiday hours.'''

strip2 = s2.replace(s1, "").replace("Hours:","").strip()

print(strip2)
Run Code Online (Sandbox Code Playgroud)