说我有这样的文本文件:
<html><head>Headline<html><head>more words
</script>even more words</script>
<html><head>Headline<html><head>more words
</script>even more words</script>
Run Code Online (Sandbox Code Playgroud)
我如何将标签放入如下列表中:
<html>
<head>
<html>
<head>
</script>
</script>
<html>
<head>
<html>
<head>
</script>
</script>
Run Code Online (Sandbox Code Playgroud)
我想这就是你想要的:
html_string = ''.join(input_file.readlines())
matches = re.findall('<.*?>', html_string)
for m in matches:
print m
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助