我需要在python中使用正则表达式在和标签之间提取文本.
例: Customizable:<strong>Features Windows 10 Pro</strong> and legacy ports <b>including VGA,</b> HDMI, RJ-45, USB Type A connections.
为此我正在做:
pattern=re.compile("(<b>(.*?)</b>)|(<strong>(.*?)</strong>)")
for label in labels:
print(label)
flag=0
if(('Window'in label or 'Windows' in label) and ('<b>' in label or '<strong>' in label)):
text=re.findall(pattern, label)
print(text)
Run Code Online (Sandbox Code Playgroud)
其中labels是包含tag的html元素的列表.预期的输出是['Features Windows 10','including VGA,']
而不是将输出作为: [('', 'Features Windows 10 Pro'), ('including VGA,', '')]
请帮忙.提前致谢.
关心BeautifulSoup?
from bs4 import BeautifulSoup
data = BeautifulSoup("""Customizable:<strong>Features Windows 10 Pro</strong> and legacy ports <b>including VGA,</b> HDMI, RJ-45, USB Type A connections""")
data.find_all('strong')[0].text
data.find_all('b')[0].text
Run Code Online (Sandbox Code Playgroud)
产量
Features Windows 10 Pro
'including VGA,'
Run Code Online (Sandbox Code Playgroud)