Ser*_*gey 1 python beautifulsoup python-3.x
请帮助修复脚本。
import pprint
import requests
import bs4
def get_catalog(url):
req = requests.get(url)
if req.status_code != requests.codes.ok:
print('Error: ', req.status_code)
else:
soup = bs4.BeautifulSoup(req.text)
#print(soup)
catalogMenu = soup.find('section', {'class': 'catalog'})
catalogMenuList = catalogMenu.find('ul', {'class': 'topnav'})
#print(catalogMenuList)
return catalogMenuList
def parse_catalog_categories(catalogMenuList):
catalogNames = []
#li = catalogMenuList.findNext('li', limit=1) #?????????????????
pprint.pprint(li)
if __name__ == "__main__":
url = 'http://first-store.ru/'
catalogMenuList = get_catalog(url)
if not catalogMenuList:
print('Get catalog error')
else:
parse_catalog_categories(catalogMenuList)
Run Code Online (Sandbox Code Playgroud)
问题是我找不到li第一层嵌套的所有后代。即:
iphone, ipad, ipod, imac, etc...
Run Code Online (Sandbox Code Playgroud)
但不是:
iphone, iphone 5s, iphone 5s VIP, iphone 5c, .....
Run Code Online (Sandbox Code Playgroud)
尝试设置recursive=False为仅在标记的直接子代中搜索:
items = catalogMenuList.find_all('li', recursive=False)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1059 次 |
| 最近记录: |