Cou*_*ney 4 beautifulsoup web-scraping python-3.x python-requests
我最近了解了网络抓取,并想创建一个程序来抓取每日产品价格。我正在 python 中使用 requests 和 bs4 来抓取 target.com。到目前为止,这是我的代码:
TIMES = [2, 3, 4, 5, 6, 7]
url = 'https://www.target.com/p/dyson-ball-animal-2-upright-vacuum-iron-purple/-/A-52190951'
sleep(choice(TIMES))
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
sleep(choice(TIMES))
name = soup.find('h1').get_text().strip().replace(',', ';')
print('Product name: ', name)
sleep(choice(TIMES))
current_price = soup.find('span', {'data-test': 'product-savings'})
print('Current price: ', current_price)
Run Code Online (Sandbox Code Playgroud)
当我运行代码时,产品名称是正确的,但当前价格始终为“无”。我应该有其他方式来搜索产品价格吗?
提前致谢!
只要您有商品/产品 ID,您就可以创建一个会话来获取本地商店 id、api 密钥,然后从 API 获取:
import pandas as pd
import requests
s = requests.session()
s.get('https://www.target.com')
key = s.cookies['visitorId']
location = s.cookies['GuestLocation'].split('|')[0]
store_id = requests.get('https://redsky.target.com/v3/stores/nearby/%s?key=%s&limit=1&within=100&unit=mile' %(location, key)).json()
store_id = store_id[0]['locations'][0]['location_id']
product_id = '52190951'
url = 'https://redsky.target.com/web/pdp_location/v1/tcin/%s' %product_id
payload = {
'pricing_store_id': store_id,
'key': key}
jsonData = requests.get(url, params=payload).json()
df = pd.DataFrame(jsonData['price'], index=[0])
Run Code Online (Sandbox Code Playgroud)
输出:
print (df.to_string())
tcin location_id reg_retail current_retail current_retail_start_timestamp current_retail_end_timestamp default_price formatted_current_price formatted_current_price_type is_current_price_range
0 52190951 3991 499.99 499.99 2019-10-19T07:00:00Z 9999-12-31T00:00:00Z False $499.99 reg False
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5150 次 |
| 最近记录: |