如何使用漂亮的汤查找特定的视频html标签？

Question

如何使用漂亮的汤查找特定的视频html标签？

有谁知道如何在python中使用beautifulsoup。

我有一个带有不同网址列表的搜索引擎。

我只想获取包含视频嵌入网址的html标签。并获取链接。

例

import BeautifulSoup

html = '''https://archive.org/details/20070519_detroit2'''
    #or this.. html = '''http://www.kumby.com/avatar-the-last-airbender-book-3-chapter-5/'''
    #or this... html = '''https://www.youtube.com/watch?v=fI3zBtE_S_k'''

soup = BeautifulSoup.BeautifulSoup(html)

Run Code Online (Sandbox Code Playgroud)

我下一步该怎么做。获取视频，对象或视频的确切链接的html标签。

我需要将它放在我的iframe上。我将python集成到我的php中。所以获取视频的链接并使用python输出它，然后我将在我的iframe上回显它。

Answer 1

Ser*_*ial 5

您需要获取页面的html而不只是url

urllib像这样使用内置库：

import urllib
from bs4 import BeautifulSoup as BS

url = '''https://archive.org/details/20070519_detroit2'''
#open and read page
page = urllib.urlopen(url)
html = page.read()
#create BeautifulSoup parse-able "soup"
soup = BS(html)
#get the src attribute from the video tag
video = soup.find("video").get("src")

Run Code Online (Sandbox Code Playgroud)

同样在您正在使用的网站上，我注意到要获取嵌入链接，只需更改details链接即可，embed因此如下所示：

https://archive.org/embed/20070519_detroit2

Run Code Online (Sandbox Code Playgroud)

因此，如果您想对多个网址进行解析而不必解析，只需执行以下操作：

url = '''https://archive.org/details/20070519_detroit2'''
spl = url.split('/')
spl[3] = 'embed'
embed = "/".join(spl)
print embed

Run Code Online (Sandbox Code Playgroud)

编辑

要获取您在编辑中提供的其他链接的嵌入链接，您需要浏览正在解析的页面的html，直到找到该链接，然后在其中获取标签，然后在属性中

对于

'''http://www.kumby.com/avatar-the-last-airbender-book-3-chapter-5/'''

Run Code Online (Sandbox Code Playgroud)

做就是了

soup.find("iframe").get("src")

Run Code Online (Sandbox Code Playgroud)

在iframe监守链接是在iframe标签及.get("src")，因为链接是src属性

您可以尝试下一个，因为如果您希望将来能够做的话，您应该学习如何做：)

祝好运！

归档时间：	12 年，1 月前
查看次数：	6243 次
最近记录：	12 年，1 月前