正则表达式在<b>和<strong>标记之间提取文本

Kir*_*wal 0 python regex

我需要在python中使用正则表达式在和标签之间提取文本.

例: Customizable:<strong>Features Windows 10 Pro</strong> and legacy ports <b>including VGA,</b> HDMI, RJ-45, USB Type A connections.

为此我正在做:

pattern=re.compile("(<b>(.*?)</b>)|(<strong>(.*?)</strong>)")
for label in labels:
    print(label)
    flag=0
    if(('Window'in label or 'Windows' in label) and ('<b>' in label or '<strong>' in label)):
        text=re.findall(pattern, label)
        print(text)
Run Code Online (Sandbox Code Playgroud)

其中labels是包含tag的html元素的列表.预期的输出是['Features Windows 10','including VGA,']

而不是将输出作为: [('', 'Features Windows 10 Pro'), ('including VGA,', '')]

请帮忙.提前致谢.

iam*_*aus 6

关心BeautifulSoup?

from bs4 import BeautifulSoup

data = BeautifulSoup("""Customizable:<strong>Features Windows 10 Pro</strong> and legacy ports <b>including VGA,</b> HDMI, RJ-45, USB Type A connections""")

data.find_all('strong')[0].text
data.find_all('b')[0].text
Run Code Online (Sandbox Code Playgroud)

产量

Features Windows 10 Pro
'including VGA,'
Run Code Online (Sandbox Code Playgroud)