美丽的汤元内容标记

Question

美丽的汤元内容标记

Lui*_*uez 4 html python beautifulsoup html-parsing

<meta itemprop="streetAddress" content="4103 Beach Bluff Rd">

Run Code Online (Sandbox Code Playgroud)

我必须得到内容'4103 Beach Bluff Rd'.我正试着这样做BeautifulSoup,我正在尝试这个:

soup = BeautifulSoup('<meta itemprop="streetAddress" content="4103 Beach Bluff Rd"> ')

soup.find(itemprop="streetAddress").get_text()

Run Code Online (Sandbox Code Playgroud)

但是我得到一个empy字符串作为结果,这可能有意义,因为当打印汤对象

print soup

Run Code Online (Sandbox Code Playgroud)

我明白了:

<html><head><meta content="4103 Beach Bluff Rd" itemprop="streetAddress"/> </head></html>

Run Code Online (Sandbox Code Playgroud)

显然,我想要的数据是在"元内容"标签中,我该如何获取这些数据？

Answer 1

ale*_*cxe 11

soup.find(itemprop="streetAddress").get_text()

您将获得匹配元素的文本.相反,获取"content"属性值:

soup.find(itemprop="streetAddress").get("content")

Run Code Online (Sandbox Code Playgroud)

这是可能的,因为BeautifulSoup为标记属性提供了类似字典的界面:

您可以通过将标记视为字典来访问标记的属性.

演示:

>>> from bs4 import BeautifulSoup
>>>
>>> soup = BeautifulSoup('<meta itemprop="streetAddress" content="4103 Beach Bluff Rd"> ')
>>> soup.find(itemprop="streetAddress").get_text()
u''
>>> soup.find(itemprop="streetAddress").get("content")
'4103 Beach Bluff Rd'

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，11 月前
查看次数：	3609 次
最近记录：	9 年，10 月前