Pau*_*aul 1 python web-scraping
我正在尝试从耐克产品页面上抓取所有可用尺寸。例如这个页面:
https://www.nike.com/t/air-force-1-07-mens-shoe-JkTGzADv/315122-111
我尝试加载网站并将其写入文本文件,如下所示:
import requests
from bs4 import BeautifulSoup
url = "https://www.nike.com/t/air-force-1-07-mens-shoe-JkTGzADv/315122-111"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
soupstirng = str(soup)
with open("website.txt","w") as f:
f.write(soupstirng)
f.close()
Run Code Online (Sandbox Code Playgroud)
但我的问题是,创建的文本文件没有加载鞋码的元素。所以我无法从此文件中提取可用大小。我想不出一种方法来检索尺寸。有人知道如何在 python 中检索可用大小吗?
尺寸是在页面加载后填充的,这是您看不到它们的原因之一。第二个原因是在使用requests的时候需要使用headers参数才能得到更好的结果。
让我们解决这个问题:
import requests
import json
#Headers are highly recommended
headers = headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0',
'Accept': 'image/webp,*/*',
'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
}
url = "https://www.nike.com/t/air-force-1-07-mens-shoe-JkTGzADv/315122-111"
page = requests.get(url,headers = headers)
soup = BeautifulSoup(page.content, 'html.parser')
#The web page is populated with data contained in a script tag which we will look for
#It is json data
data = json.loads(soup.find('script',text=re.compile('INITIAL_REDUX_STATE')).text.replace('window.INITIAL_REDUX_STATE=','')[0:-1])
#The Sku we are searching for
product_id = "315122-111"
#In the json file, the following will give us the possible SKUs list
skus = data['Threads']['products'][product_id]['skus']
#And the following their availability
available_skus = data['Threads']['products'][product_id]['availableSkus']
#Let's use pandas to cross both tables
df_skus = pd.DataFrame(skus)
df_available_skus = pd.DataFrame(available_skus)
#Here is finally the table with the available skus and their sizes
df_skus.merge(df_available_skus[['skuId','available']], on ='skuId')
# which can be saved in any format you want (xl, txt, csv, json...)
Run Code Online (Sandbox Code Playgroud)
输出
| id | nikeSize | skuId | localizedSize | localizedSizePrefix | available |
|---------:|-----------:|:-------------------------------------|----------------:|:----------------------|:------------|
| 10042654 | 12.5 | 118cf6d0-e1c0-50ac-a620-7f3a7f9c0b64 | 47 | EU | True |
| 10042656 | 14 | 0fb2d87f-a7f8-5e36-8961-99c35b0360c1 | 48.5 | EU | True |
| 10042657 | 15 | f80a30b2-8a7c-5834-82c4-9bea2c0c9995 | 49.5 | EU | True |
| 10042658 | 16 | 3e323cdc-1c35-5663-895e-f3f809edff1e | 50.5 | EU | True |
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1543 次 |
| 最近记录: |