jpe*_*796 3 html python beautifulsoup python-3.x
我使用以下代码从亚马逊列表中提取我需要的 HTML:
import requests
from bs4 import BeautifulSoup
r=requests.get("http://www.amazon.com/dp/B0007RXSB4")
soup=BeautifulSoup(r.content)
soup.find_all("div", {"id":"imgTagWrapperId"})
Run Code Online (Sandbox Code Playgroud)
这给了我这个:
[<div class="imgTagWrapper" id="imgTagWrapperId">\n<img alt="Johnston
& Murphy Men's Greenwich Oxford,Black,6 D" class="a-dynamic-image
a-stretch-vertical" data-a-dynamic-image='{"http://ecx.images-
amazon.com/images/I/81zwayZox-S._UY695_.jpg":
[695,695],"http://ecx.images-amazon.com/images/I/81zwayZox-
S._UY535_.jpg":[535,535],"http://ecx.images-
amazon.com/images/I/81zwayZox-S._UY500_.jpg":
[500,500],"http://ecx.images-amazon.com/images/I/81zwayZox-
S._UY575_.jpg":[575,575],"http://ecx.images-
amazon.com/images/I/81zwayZox-S._UY395_.jpg":
[395,395],"http://ecx.images-amazon.com/images/I/81zwayZox-
S._UY585_.jpg":[585,585]}' data-old-hires="http://ecx.images-
amazon.com/images/I/81zwayZox-S._UL1500_.jpg" id="landingImage"
onload="this.onload='';setCSMReq('af');if(typeof addlongPoleTag ===
'function'){ addlongPoleTag('af','desktop-image-atf-
marker');};setCSMReq('cf')" src="http://ecx.images-
amazon.com/images/I/41KixMIlPNL._SY395_QL70_.jpg" style="max-
width:695px;max-height:695px;">\n</img></div>]
Run Code Online (Sandbox Code Playgroud)
我只需要知道如何提取
http://ecx.images-amazon.com/images/I/81zwayZox-S._UY695_.jpg
从上面的代码。
首先,您需要img在已经找到的 div 中找到标签。一种方法是链接find()调用:
img = soup.find("div", {"id": "imgTagWrapperId"}).find("img")
Run Code Online (Sandbox Code Playgroud)
或者,使用CSS 选择器:
img = soup.select_one("div#imgTagWrapperId > img")
Run Code Online (Sandbox Code Playgroud)
然后,如果您需要src属性中的图像 URL :
img["src"]
Run Code Online (Sandbox Code Playgroud)
如果您需要data-a-dynamic-image属性内的图像 URL ,我建议您将值加载到带有json模块的 python 字典中并获取keys():
import json
img = soup.find("div", {"id": "imgTagWrapperId"}).find("img")
data = json.loads(img["data-a-dynamic-image"])
print(list(data.keys()))
Run Code Online (Sandbox Code Playgroud)
印刷:
[
u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY695_.jpg',
u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY575_.jpg',
u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY500_.jpg',
u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY395_.jpg',
u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY535_.jpg',
u'http://ecx.images-amazon.com/images/I/81zwayZox-S._UY585_.jpg'
]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
7633 次 |
| 最近记录: |