如何获取HTML标签?

Bra*_*ble 0 python

说我有这样的文本文件:

<html><head>Headline<html><head>more words
</script>even more words</script>
<html><head>Headline<html><head>more words
</script>even more words</script>
Run Code Online (Sandbox Code Playgroud)

我如何将标签放入如下列表中:

<html>
<head>
<html>
<head>
</script>
</script>
<html>
<head>
<html>
<head>
</script>
</script>
Run Code Online (Sandbox Code Playgroud)

ins*_*get 6

我想这就是你想要的:

html_string = ''.join(input_file.readlines())
matches = re.findall('<.*?>', html_string)
for m in matches:
    print m
Run Code Online (Sandbox Code Playgroud)

希望这可以帮助