小编Kam*_*san的帖子

引发 AttributeError:响应内容不是文本 Scarpy 代理池。怎么解决?

引发AttributeError: Response content is not text。我正在使用scrapy_proxy_pool 和scrapy_user_agents。我试图找到目标网站的每一个链接。

import scrapy

class LinksSpider(scrapy.Spider):

    name = 'links'
    allowed_domains = ['www.chotosite.com','chotosite.com']

    extracted_links = []

    def start_requests(self):
        start_urls = 'https://www.chotosite.com'
        yield scrapy.Request(url=start_urls, callback=self.extract_link)

    def extract_link(self, response):
        # eleminating images url from links
        str_response_content_type = str(response.headers.get('content-type'))
        if str_response_content_type == "b'text/html; charset=UTF-8'" :
            
            links = response.xpath("//a/@href").extract()

            for link in links:
                if "chotosite" in link and link not in self.extracted_links:
                    self.extracted_links.append(link)
                    yield scrapy.Request(url=link, callback=self.extract_link)

                    yield {
                        "links": link
                    }
Run Code Online (Sandbox Code Playgroud)

这是我的 settings.py 文件

BOT_NAME = 'chotosite'

SPIDER_MODULES …
Run Code Online (Sandbox Code Playgroud)

python scrapy web-scraping

5
推荐指数
1
解决办法
663
查看次数

scrapy.spidermiddlewares.offsite DEBUG:过滤对我想要抓取的网站的异地请求。为什么我不能解析方法?

我的目标是当我迭代 get_membership_no 方法中的 for 循环时,从 parse 方法中打印一些内容。

我正在使用 python3.8.5、Scrapy 1.7.3,当我运行下面提到的代码时,我得到“已过滤的异地请求”。这是控制台输出。 在此输入图像描述

这是我的代码。

import scrapy
import json
class BasisMembersSpider(scrapy.Spider):
    name = 'basis'
    allowed_domains = ['www.basis.org.bd']

    def start_requests(self):

        yield scrapy.Request(url="https://basis.org.bd/get-member-list?page=1&team=", callback=self.get_membership_no)


    def get_membership_no(self, response):

        data_array = json.loads(response.body)['data']

        for data in data_array:

            yield scrapy.Request(url='https://basis.org.bd/get-company-profile/{0}'.format(data['membership_no']), callback=self.parse)


    def parse(self, response):
        print("I want to get this line on console. thank you.")
Run Code Online (Sandbox Code Playgroud)

python scrapy web-scraping

2
推荐指数
1
解决办法
2247
查看次数

ValueError:消息中最多可能有 1 个 To 标头

我正在尝试编写一个非常基本的电子邮件发送脚本。这是我的代码..

import smtplib
from email.message import EmailMessage

msg = EmailMessage()
msg.set_content("Test message.")
msg['Subject'] = "Test Subject!!!"
msg['From'] = "myemail@gmail.com"

email_list = ["xyz@gmail.com", "abc@gmail.com"]

for email in email_list:
    msg['To'] = email
    server = smtplib.SMTP(host='smtp.gmail.com', port=587)
    server.starttls()
    server.login("myemail@gmail.com", "mypassword")
    server.send_message(msg)
    server.quit()
Run Code Online (Sandbox Code Playgroud)

该脚本应该将邮件发送给多个收件人,因此,我需要msg['To']在循环迭代时更改该字段,但我在下面的回溯中收到以下错误。

Traceback (most recent call last):
  File "exp.py", line 66, in <module>
    msg['To'] = email
  File "/usr/lib/python3.8/email/message.py", line 407, in __setitem__
    raise ValueError("There may be at most {} {} headers "
ValueError: There may be at most 1 To …
Run Code Online (Sandbox Code Playgroud)

python smtplib python-3.x

0
推荐指数
1
解决办法
3171
查看次数

标签 统计

python ×3

scrapy ×2

web-scraping ×2

python-3.x ×1

smtplib ×1