我正在使用漂亮的汤从rss页面解析html代码。如何保存链接标签?
该代码最有前途的代码是:
python
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
url = 'https://advisories.ncsc.nl/rss/advisories'
uh = urllib.request.urlopen(url)
html_doc= uh.read()
soup = BeautifulSoup(html_doc, 'html.parser')
Run Code Online (Sandbox Code Playgroud)
我尝试import lxml将代码切换到,
python soup = BeautifulSoup(html_doc, 'xml')
但这给了我一个错误:
ModuleNotFoundError: No module named 'lxml'
Run Code Online (Sandbox Code Playgroud)
我希望结果是,
<link>https://someurl.org</link>但输出是<link/>someurl.org