per*_*alt 4 python beautifulsoup python-3.x
from bs4 import BeautifulSoup
import codecs
import sys
import urllib.request
site_response= urllib.request.urlopen("http://site/")
html=site_response.read()
file = open ("cars.html","wb") #open file in binary mode
file.write(html)
file.close()
soup = BeautifulSoup(open("cars.html"))
output = (soup.prettify('latin'))
#print(output) #prints whole file for testing
file_output = open ("cars_out.txt","wb")
file_output.write(output)
file_output.close()
fulllist=soup.find_all("div", class_="row vehicle")
#print(fulllist) #prints each row vehicle class for debug
for item in fulllist:
item_print=item.find("span", class_="modelYearSort").string
item_print=item_print + "|" + item.find("span", class_="mmtSort").string
seller_phone=item.find("span", class_="seller-phone")
print(seller_phone)
# item_print=item_print + "|" + item.find("span", class_="seller-phone").string
item_print=item_print + "|" + item.find("span", class_="priceSort").string
item_print=item_print + "|" + item.find("span", class_="milesSort").string
print(item_print)
Run Code Online (Sandbox Code Playgroud)
我有上面的代码,它解析一些HTML代码并生成一个管道描述文件.它工作正常,除了有一些条目,其中一个元素(卖家电话)从html代码中丢失.并非所有条目都有卖家电话号码.
item.find("span", class_="seller-phone").string
Run Code Online (Sandbox Code Playgroud)
我在这里失败了.当卖家电话丢失时,我并不感到惊讶.我得到'AttributeError'NoneType对象没有属性字符串.
我可以在不使用'.string'的情况下执行'item.find'并获取完整的html块.但我无法弄清楚如何提取这些案例的文本.
你是对的,如果没有找到元素则soup.find返回None.
你可以放一个if/else条款来避免这种情况:
for item in fulllist:
span = item.find("span", class_="modelYearSort")
if span:
item_print = span.string
item_print=item_print + "|" + item.find("span", class_="mmtSort").string
seller_phone=item.find("span", class_="seller-phone")
print(seller_phone)
# item_print=item_print + "|" + item.find("span", class_="seller-phone").string
item_print=item_print + "|" + item.find("span", class_="priceSort").string
item_print=item_print + "|" + item.find("span", class_="milesSort").string
print(item_print)
else:
continue #It's empty, go on to the next loop.
Run Code Online (Sandbox Code Playgroud)
或者如果你喜欢它,使用一个try/except块:
for item in fulllist:
try:
item_print=item.find("span", class_="modelYearSort").string
except AttributeError:
continue #skip to the next loop.
else:
item_print=item_print + "|" + item.find("span", class_="mmtSort").string
seller_phone=item.find("span", class_="seller-phone")
print(seller_phone)
# item_print=item_print + "|" + item.find("span", class_="seller-phone").string
item_print=item_print + "|" + item.find("span", class_="priceSort").string
item_print=item_print + "|" + item.find("span", class_="milesSort").string
print(item_print)
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助!
| 归档时间: |
|
| 查看次数: |
6662 次 |
| 最近记录: |