使用BeautifulSoup获取属性的值

adi*_*pta 8 python beautifulsoup python-2.7

我正在编写一个python脚本,它将在从网页解析后提取脚本位置.可以说有两种情况:

<script type="text/javascript" src="http://example.com/something.js"></script>
Run Code Online (Sandbox Code Playgroud)

<script>some JS</script>
Run Code Online (Sandbox Code Playgroud)

我可以从第二个场景中获取JS,也就是在标签内部编写JS时.

但是有什么办法,我可以从第一个场景中获取src的值(即在脚本中提取src标签的所有值,例如http://example.com/something.js)

这是我的代码

#!/usr/bin/python

import requests 
from bs4 import BeautifulSoup

r  = requests.get("http://rediff.com/")
data = r.text
soup = BeautifulSoup(data)
for n in soup.find_all('script'):
    print n 
Run Code Online (Sandbox Code Playgroud)

输出:一些JS

需要输出:http://example.com/something.js

Ven*_*raj 22

src只有在存在时才会获得所有值.否则它会跳过该<script>标签

from bs4 import BeautifulSoup
import urllib2
url="http://rediff.com/"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
sources=soup.findAll('script',{"src":True})
for source in sources:
 print source['src']
Run Code Online (Sandbox Code Playgroud)

我得到了两个 src值作为结果

http://imworld.rediff.com/worldrediff/js_2_5/ws-global_hm_1.js
http://im.rediff.com/uim/common/realmedia_banner_1_5.js
Run Code Online (Sandbox Code Playgroud)

我想这就是你想要的.希望这很有用.


raj*_*jpy 5

从脚本节点获取'src'.

import requests 
from bs4 import BeautifulSoup

r  = requests.get("http://rediff.com/")
data = r.text
soup = BeautifulSoup(data)
for n in soup.find_all('script'):
    print "src:", n.get('src') <==== 
Run Code Online (Sandbox Code Playgroud)