正则表达式在和标记之间提取文本

Question

正则表达式在和标记之间提取文本

我需要在python中使用正则表达式在和标签之间提取文本.

例: Customizable:Features Windows 10 Pro and legacy ports including VGA, HDMI, RJ-45, USB Type A connections.

为此我正在做:

pattern=re.compile("(<b>(.*?)</b>)|(<strong>(.*?)</strong>)")
for label in labels:
    print(label)
    flag=0
    if(('Window'in label or 'Windows' in label) and ('<b>' in label or '<strong>' in label)):
        text=re.findall(pattern, label)
        print(text)

Run Code Online (Sandbox Code Playgroud)

其中labels是包含tag的html元素的列表.预期的输出是['Features Windows 10','including VGA,']

而不是将输出作为: [('', 'Features Windows 10 Pro'), ('including VGA,', '')]

请帮忙.提前致谢.

Answer 1

iam*_*aus 6

关心BeautifulSoup？

from bs4 import BeautifulSoup

data = BeautifulSoup("""Customizable:<strong>Features Windows 10 Pro</strong> and legacy ports <b>including VGA,</b> HDMI, RJ-45, USB Type A connections""")

data.find_all('strong')[0].text
data.find_all('b')[0].text

Run Code Online (Sandbox Code Playgroud)

产量

Features Windows 10 Pro
'including VGA,'

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，8 月前
查看次数：	205 次
最近记录：	7 年，8 月前

正则表达式在<b>和<strong>标记之间提取文本