如何使用python从url中提取元描述?

Tec*_*c27 7 python url extract meta-tags goose

我想从以下网站中提取标题和说明:

view-source:http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/

使用以下代码片段:

<title>Book a Virgin Australia Flight | Virgin Australia
</title>
    <meta name="keywords" content="" />
        <meta name="description" content="Search for and book Virgin Australia and partner flights to Australian and international destinations." />
Run Code Online (Sandbox Code Playgroud)

我想要标题和元内容.

我使用鹅但是它没有很好地提取.这是我的代码:

website_title = [g.extract(url).title for url in clean_url_data]
Run Code Online (Sandbox Code Playgroud)

website_meta_description=[g.extract(urlw).meta_description for urlw in clean_url_data] 
Run Code Online (Sandbox Code Playgroud)

结果是空的

lin*_*gta 11

请检查BeautifulSoup作为解决方案.

对于上述问题,您可以使用以下代码来提取"描述"信息:

import requests
from bs4 import BeautifulSoup

url = 'http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/'
response = requests.get(url)
soup = BeautifulSoup(response.text)

metas = soup.find_all('meta')

print [ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and meta.attrs['name'] == 'description' ]
Run Code Online (Sandbox Code Playgroud)

输出:

['Search for and book Virgin Australia and partner flights to Australian and international destinations.']
Run Code Online (Sandbox Code Playgroud)