Python HTMLParser

Question

Python HTMLParser

我正在使用HTMLParser解析一个html文档,我想在ap标签的开头和结尾之间打印内容

请参阅我的代码段

    def handle_starttag(self, tag, attrs):
        if tag == 'p':
            print "TODO: print the contents"

Run Code Online (Sandbox Code Playgroud)

Answer 1

Dar*_*mas 6

基于@tauran发布的内容,您可能希望执行以下操作:

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):
    def print_p_contents(self, html):
        self.tag_stack = []
        self.feed(html)

    def handle_starttag(self, tag, attrs):
        self.tag_stack.append(tag.lower())

    def handle_endtag(self, tag):
        self.tag_stack.pop()

    def handle_data(self, data):
        if self.tag_stack[-1] == 'p':
            print data

p = MyHTMLParser()
p.print_p_contents('<p>test</p>')

Run Code Online (Sandbox Code Playgroud)

现在,您可能希望将所有<p>内容推送到列表中并返回结果或其他类似内容.

TIL:在使用这样的库时,你需要在堆栈中思考!

Answer 2

tau*_*ran 5

我扩展了文档中的示例：

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        print "Encountered the beginning of a %s tag" % tag

    def handle_endtag(self, tag):
        print "Encountered the end of a %s tag" % tag

    def handle_data(self, data):
        print "Encountered data %s" % data

p = MyHTMLParser()
p.feed('<p>test</p>')

Run Code Online (Sandbox Code Playgroud)

——

Encountered the beginning of a p tag
Encountered data test
Encountered the end of a p tag

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，5 月前
查看次数：	8175 次
最近记录：	7 年，5 月前