Ham*_*Spb 1 python parsing lxml
我有一些html文件:
<html>
<body>
<span class="text">One</span>some text1</br>
<span class="cyrillic">???</span>some text2</br>
</body>
</html>
Run Code Online (Sandbox Code Playgroud)
如何使用带Python的lxml获取"some text1"和"some text2"?
import lxml.html
doc = lxml.html.document_fromstring("""<html>
<body>
<span class="text">One</span>some text1</br>
<span class="cyrillic">???</span>some text2</br>
</body>
</html>
""")
txt1 = doc.xpath('/html/body/span[@class="text"]/following-sibling::text()[1]')
txt2 = doc.xpath('/html/body/span[@class="cyrillic"]/following-sibling::text()[1]')
Run Code Online (Sandbox Code Playgroud)