BeautifulSoup - 处理variable.find().string返回空的情况

per*_*alt 4 python beautifulsoup python-3.x

from bs4 import BeautifulSoup
import codecs
import sys

import urllib.request
site_response= urllib.request.urlopen("http://site/")
html=site_response.read()
file = open ("cars.html","wb") #open file in binary mode
file.write(html)
file.close()


soup = BeautifulSoup(open("cars.html"))
output = (soup.prettify('latin'))
#print(output) #prints whole file for testing

file_output = open ("cars_out.txt","wb")
file_output.write(output)
file_output.close()

fulllist=soup.find_all("div", class_="row vehicle")
#print(fulllist) #prints each row vehicle class for debug

for item in fulllist:
    item_print=item.find("span", class_="modelYearSort").string
    item_print=item_print + "|" + item.find("span", class_="mmtSort").string
    seller_phone=item.find("span", class_="seller-phone")
    print(seller_phone)
    # item_print=item_print + "|" + item.find("span", class_="seller-phone").string
    item_print=item_print + "|" + item.find("span", class_="priceSort").string
    item_print=item_print + "|" + item.find("span", class_="milesSort").string
    print(item_print)
Run Code Online (Sandbox Code Playgroud)

我有上面的代码,它解析一些HTML代码并生成一个管道描述文件.它工作正常,除了有一些条目,其中一个元素(卖家电话)从html代码中丢失.并非所有条目都有卖家电话号码.

item.find("span", class_="seller-phone").string
Run Code Online (Sandbox Code Playgroud)

我在这里失败了.当卖家电话丢失时,我并不感到惊讶.我得到'AttributeError'NoneType对象没有属性字符串.

我可以在不使用'.string'的情况下执行'item.find'并获取完整的html块.但我无法弄清楚如何提取这些案例的文本.

aIK*_*Kid 6

你是对的,如果没有找到元素则soup.find返回None.

你可以放一个if/else条款来避免这种情况:

for item in fulllist:
    span = item.find("span", class_="modelYearSort")
    if span:
        item_print = span.string
        item_print=item_print + "|" + item.find("span", class_="mmtSort").string
        seller_phone=item.find("span", class_="seller-phone")
        print(seller_phone)
        # item_print=item_print + "|" + item.find("span", class_="seller-phone").string
        item_print=item_print + "|" + item.find("span", class_="priceSort").string
        item_print=item_print + "|" + item.find("span", class_="milesSort").string
        print(item_print)
    else:
        continue #It's empty, go on to the next loop.
Run Code Online (Sandbox Code Playgroud)

或者如果你喜欢它,使用一个try/except块:

for item in fulllist:
    try:
        item_print=item.find("span", class_="modelYearSort").string
    except AttributeError:
        continue #skip to the next loop.
    else:
        item_print=item_print + "|" + item.find("span", class_="mmtSort").string
        seller_phone=item.find("span", class_="seller-phone")
        print(seller_phone)
        # item_print=item_print + "|" + item.find("span", class_="seller-phone").string
        item_print=item_print + "|" + item.find("span", class_="priceSort").string
        item_print=item_print + "|" + item.find("span", class_="milesSort").string
        print(item_print)
Run Code Online (Sandbox Code Playgroud)

希望这可以帮助!