我尝试过这篇文章但是,它似乎并不适合我.
我试过这段代码:
for bresult in response.css(LIST_SELECTOR):
NAME_SELECTOR = 'h2 a ::attr(href)'
yield {
'name': bresult.css(NAME_SELECTOR).extract_first(),
}
b_result_list.append(bresult.css(NAME_SELECTOR).extract_first())
#set b_result_list to SET to remove dups, then change back to LIST
set(b_result_list)
list(set(b_result_list))
for brl in b_result_list:
print("brl: {}".format(brl))
Run Code Online (Sandbox Code Playgroud)
打印出:
brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users
brl: https://facebook.site.com/users/login
Run Code Online (Sandbox Code Playgroud)
当我需要时:
brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users
Run Code Online (Sandbox Code Playgroud)
我在这做错了什么?
谢谢!
你需要保存它时丢弃结果... b_result_list从不实际更改...所以你只是迭代原始列表.而是保存set操作的结果
b_result_list = list(set(b_result_list))
Run Code Online (Sandbox Code Playgroud)
(注意sets不保留顺序)