引发AttributeError: Response content is not text。我正在使用scrapy_proxy_pool 和scrapy_user_agents。我试图找到目标网站的每一个链接。
import scrapy
class LinksSpider(scrapy.Spider):
name = 'links'
allowed_domains = ['www.chotosite.com','chotosite.com']
extracted_links = []
def start_requests(self):
start_urls = 'https://www.chotosite.com'
yield scrapy.Request(url=start_urls, callback=self.extract_link)
def extract_link(self, response):
# eleminating images url from links
str_response_content_type = str(response.headers.get('content-type'))
if str_response_content_type == "b'text/html; charset=UTF-8'" :
links = response.xpath("//a/@href").extract()
for link in links:
if "chotosite" in link and link not in self.extracted_links:
self.extracted_links.append(link)
yield scrapy.Request(url=link, callback=self.extract_link)
yield {
"links": link
}
Run Code Online (Sandbox Code Playgroud)
这是我的 settings.py 文件
BOT_NAME = 'chotosite'
SPIDER_MODULES …Run Code Online (Sandbox Code Playgroud) 我的目标是当我迭代 get_membership_no 方法中的 for 循环时,从 parse 方法中打印一些内容。
我正在使用 python3.8.5、Scrapy 1.7.3,当我运行下面提到的代码时,我得到“已过滤的异地请求”。这是控制台输出。

这是我的代码。
import scrapy
import json
class BasisMembersSpider(scrapy.Spider):
name = 'basis'
allowed_domains = ['www.basis.org.bd']
def start_requests(self):
yield scrapy.Request(url="https://basis.org.bd/get-member-list?page=1&team=", callback=self.get_membership_no)
def get_membership_no(self, response):
data_array = json.loads(response.body)['data']
for data in data_array:
yield scrapy.Request(url='https://basis.org.bd/get-company-profile/{0}'.format(data['membership_no']), callback=self.parse)
def parse(self, response):
print("I want to get this line on console. thank you.")
Run Code Online (Sandbox Code Playgroud) 我正在尝试编写一个非常基本的电子邮件发送脚本。这是我的代码..
import smtplib
from email.message import EmailMessage
msg = EmailMessage()
msg.set_content("Test message.")
msg['Subject'] = "Test Subject!!!"
msg['From'] = "myemail@gmail.com"
email_list = ["xyz@gmail.com", "abc@gmail.com"]
for email in email_list:
msg['To'] = email
server = smtplib.SMTP(host='smtp.gmail.com', port=587)
server.starttls()
server.login("myemail@gmail.com", "mypassword")
server.send_message(msg)
server.quit()
Run Code Online (Sandbox Code Playgroud)
该脚本应该将邮件发送给多个收件人,因此,我需要msg['To']在循环迭代时更改该字段,但我在下面的回溯中收到以下错误。
Traceback (most recent call last):
File "exp.py", line 66, in <module>
msg['To'] = email
File "/usr/lib/python3.8/email/message.py", line 407, in __setitem__
raise ValueError("There may be at most {} {} headers "
ValueError: There may be at most 1 To …Run Code Online (Sandbox Code Playgroud)