连接被对方​​拒绝:10061:无法建立连接,因为目标机器主动拒绝

Flo*_*Flo 6 python scrapy docker scrapy-splash windows-server-2019

我的步骤:

\n
    \n
  1. 打造形象docker build . -t scrapy
  2. \n
  3. 运行一个容器docker run -it -p 8050:8050 --rm scrapy
  4. \n
  5. 在容器中运行 scrapy 项目:scrapy crawl foobar -o allobjects.json
  6. \n
\n

这在本地有效,但在我的生产服务器上我收到错误:

\n
\n

[scrapy.downloadermiddlewares.retry] DEBUG:重试 <GET https://www.example.com via http://localhost:8050/execute> (失败 1 次):连接被另一方拒绝:10061:无法连接之所以被制作,是因为目标机器主动拒绝了它。

\n
\n

注意:我没有使用 Docker Desktop,也不能在此服务器上使用。

\n

Dockerfile

\n
FROM mcr.microsoft.com/windows/servercore:ltsc2019\n\nSHELL ["powershell", "-Command", "$ErrorActionPreference = \'Stop\'; $ProgressPreference = \'SilentlyContinue\';"]\n\nRUN setx /M PATH $(\'C:\\Users\\ContainerAdministrator\\miniconda3\\Library\\bin;C:\\Users\\ContainerAdministrator\\miniconda3\\Scripts;C:\\Users\\ContainerAdministrator\\miniconda3;\' + $Env:PATH)\nRUN Invoke-WebRequest "https://repo.anaconda.com/miniconda/Miniconda3-py38_4.10.3-Windows-x86_64.exe" -OutFile miniconda3.exe -UseBasicParsing; \\\n    Start-Process -FilePath \'miniconda3.exe\' -Wait -ArgumentList \'/S\', \'/D=C:\\Users\\ContainerAdministrator\\miniconda3\'; \\\n    Remove-Item .\\miniconda3.exe; \\\n    conda install -y -c conda-forge scrapy;\n\nRUN pip install scrapy-splash\nRUN pip install scrapy-user-agents\n    \n#creates root directory if not exists, then enters it\nWORKDIR /root/scrapy\n\nCOPY scrapy /root/scrapy\n
Run Code Online (Sandbox Code Playgroud)\n

设置.py

\n
SPLASH_URL = \'http://localhost:8050/\'\n
Run Code Online (Sandbox Code Playgroud)\n

带命令输出scrapy crawl foobar -o allobjects.json

\n
2021-09-15 20:12:16 [scrapy.core.engine] INFO: Spider opened\n2021-09-15 20:12:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min\n)\n2021-09-15 20:12:16 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023\n2021-09-15 20:12:16 [py.warnings] WARNING: C:\\Users\\ContainerAdministrator\\miniconda3\\lib\\site-packages\\scrapy_splash\\re\nquest.py:41: ScrapyDeprecationWarning: Call to deprecated function to_native_str. Use to_unicode instead.\n  url = to_native_str(url)\n\n2021-09-15 20:12:16 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36\n2021-09-15 20:12:16 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36\n2021-09-15 20:12:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.example.com via http://localhost:8050/execute> (failed 1 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..\n2021-09-15 20:12:17 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) App\nleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36\n2021-09-15 20:12:18 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.example.com via http://localhost:8050/execute> (failed 2 times): Connection was refused by other side: 10061: No connection\ncould be made because the target machine actively refused it..\n2021-09-15 20:12:18 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64\n) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36\n2021-09-15 20:12:19 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.example.com via http://localhost:8050/execute> (failed 3 times): Connection was refused by other side: 10061: No con\nnection could be made because the target machine actively refused it..\n2021-09-15 20:12:19 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.example.com via http://localhost:8050/execute>\nTraceback (most recent call last):\n  File "C:\\Users\\ContainerAdministrator\\miniconda3\\lib\\site-packages\\scrapy\\core\\downloader\\middleware.py", line 45, in\nprocess_request\n    return (yield download_func(request=request, spider=spider))\ntwisted.internet.error.ConnectionRefusedError: Connection was refused by other side: 10061: No connection could be made\nbecause the target machine actively refused it..\n2021-09-15 20:12:19 [scrapy.core.engine] INFO: Closing spider (finished)\n2021-09-15 20:12:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats:\n{\'downloader/exception_count\': 3,\n \'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError\': 3,\n \'downloader/request_bytes\': 4632,\n \'downloader/request_count\': 3,\n \'downloader/request_method_count/POST\': 3,\n \'elapsed_time_seconds\': 3.310168,\n \'finish_reason\': \'finished\',\n \'finish_time\': datetime.datetime(2021, 9, 15, 18, 12, 19, 605641),\n \'log_count/DEBUG\': 6,\n \'log_count/ERROR\': 2,\n \'log_count/INFO\': 10,\n \'log_count/WARNING\': 46,\n \'retry/count\': 2,\n \'retry/max_reached\': 1,\n \'retry/reason_count/twisted.internet.error.ConnectionRefusedError\': 2,\n \'scheduler/dequeued\': 4,\n \'scheduler/dequeued/memory\': 4,\n \'scheduler/enqueued\': 4,\n \'scheduler/enqueued/memory\': 4,\n \'splash/execute/request_count\': 1,\n \'start_time\': datetime.datetime(2021, 9, 15, 18, 12, 16, 295473)}\n2021-09-15 20:12:19 [scrapy.core.engine] INFO: Spider closed (finished)\n
Run Code Online (Sandbox Code Playgroud)\n

我缺少什么?

\n

我已经在这里检查过:

\n\n

更新1

\n

我包含EXPOSE 8050在我的 Dockerfile 中,但得到了同样的错误。我netstat -a在docker容器内尝试过,但8050似乎不在那里?

\n

C:\\root\\scrapy>netstat -a

\n
Active Connections\n\n  Proto  Local Address          Foreign Address        State\n  TCP    0.0.0.0:135            c60d48724046:0         LISTENING\n  TCP    0.0.0.0:5985           c60d48724046:0         LISTENING\n  TCP    0.0.0.0:47001          c60d48724046:0         LISTENING\n  TCP    0.0.0.0:49152          c60d48724046:0         LISTENING\n  TCP    0.0.0.0:49153          c60d48724046:0         LISTENING\n  TCP    0.0.0.0:49154          c60d48724046:0         LISTENING\n  TCP    0.0.0.0:49155          c60d48724046:0         LISTENING\n  TCP    0.0.0.0:49159          c60d48724046:0         LISTENING\n  TCP    [::]:135               c60d48724046:0         LISTENING\n  TCP    [::]:5985              c60d48724046:0         LISTENING\n  TCP    [::]:47001             c60d48724046:0         LISTENING\n  TCP    [::]:49152             c60d48724046:0         LISTENING\n  TCP    [::]:49153             c60d48724046:0         LISTENING\n  TCP    [::]:49154             c60d48724046:0         LISTENING\n  TCP    [::]:49155             c60d48724046:0         LISTENING\n  TCP    [::]:49159             c60d48724046:0         LISTENING\n  UDP    0.0.0.0:5353           *:*\n  UDP    0.0.0.0:5355           *:*\n  UDP    127.0.0.1:51352        *:*\n  UDP    [::]:5353              *:*\n  UDP    [::]:5355              *:*\n
Run Code Online (Sandbox Code Playgroud)\n

更新2

\n

我在主机操作系统上运行的命令:

\n

docker ps输出:

\n
CONTAINER ID   IMAGE     COMMAND                    CREATED          STATUS          PORTS                    NAMES\nbf615a00b74a   scrapy    "c:\\\\windows\\\\system32\xe2\x80\xa6"   52 seconds ago   Up 49 seconds   0.0.0.0:8050->8050/tcp   blissful_brahmagupta\n
Run Code Online (Sandbox Code Playgroud)\n

netstat -a输出(为了匿名,我更改了 ip/服务器名称):

\n
Active Connections\n\n  Proto  Local Address          Foreign Address        State\n  TCP    0.0.0.0:21             exampleserver:0             LISTENING\n  TCP    0.0.0.0:25             exampleserver:0             LISTENING\n  TCP    0.0.0.0:80             exampleserver:0             LISTENING\n  TCP    0.0.0.0:110            exampleserver:0             LISTENING\n  TCP    0.0.0.0:135            exampleserver:0             LISTENING\n  TCP    0.0.0.0:143            exampleserver:0             LISTENING\n  TCP    0.0.0.0:443            exampleserver:0             LISTENING\n  TCP    0.0.0.0:445            exampleserver:0             LISTENING\n  TCP    0.0.0.0:587            exampleserver:0             LISTENING\n  TCP    0.0.0.0:995            exampleserver:0             LISTENING\n  TCP    0.0.0.0:1433           exampleserver:0             LISTENING\n  TCP    0.0.0.0:2179           exampleserver:0             LISTENING\n  TCP    0.0.0.0:3306           exampleserver:0             LISTENING\n  TCP    0.0.0.0:3389           exampleserver:0             LISTENING\n  TCP    0.0.0.0:5985           exampleserver:0             LISTENING\n  TCP    0.0.0.0:8983           exampleserver:0             LISTENING\n  TCP    0.0.0.0:33060          exampleserver:0             LISTENING\n  TCP    0.0.0.0:47001          exampleserver:0             LISTENING\n  TCP    0.0.0.0:49231          exampleserver:0             LISTENING\n  TCP    0.0.0.0:49664          exampleserver:0             LISTENING\n  TCP    0.0.0.0:49665          exampleserver:0             LISTENING\n  TCP    0.0.0.0:49666          exampleserver:0             LISTENING\n  TCP    0.0.0.0:49667          exampleserver:0             LISTENING\n  TCP    0.0.0.0:49668          exampleserver:0             LISTENING\n  TCP    0.0.0.0:49673          exampleserver:0             LISTENING\n  TCP    0.0.0.0:49881          exampleserver:0             LISTENING\n  TCP    12.12.12.12:21        103.144.31.100:ftp     SYN_RECEIVED\n  TCP    12.12.12.12:25        ip245:1256             TIME_WAIT\n  TCP    12.12.12.12:25        ip245:12756            TIME_WAIT\n  TCP    12.12.12.12:25        ip245:25324            TIME_WAIT\n  TCP    12.12.12.12:25        ip245:30624            TIME_WAIT\n  TCP    12.12.12.12:25        ip245:48206            TIME_WAIT\n  TCP    12.12.12.12:25        ip245:59510            TIME_WAIT\n  TCP    12.12.12.12:80        ec2-52-31-126-154:1440  ESTABLISHED\n  TCP    12.12.12.12:80        ec2-52-31-157-215:31240  ESTABLISHED\n  TCP    12.12.12.12:80        ec2-52-31-205-57:65197  ESTABLISHED\n  TCP    12.12.12.12:80        ninja-crawler92:36060  ESTABLISHED\n  TCP    12.12.12.12:80        13:62786               TIME_WAIT\n  TCP    12.12.12.12:80        16:22362               TIME_WAIT\n  TCP    12.12.12.12:80        19:4130                TIME_WAIT\n  TCP    12.12.12.12:80        22:30072               TIME_WAIT\n  TCP    12.12.12.12:80        22:51362               TIME_WAIT\n  TCP    12.12.12.12:80        34:9586                TIME_WAIT\n  TCP    12.12.12.12:80        35:40210               TIME_WAIT\n  TCP    12.12.12.12:80        35:65164               TIME_WAIT\n  TCP    12.12.12.12:80        38:17882               TIME_WAIT\n  TCP    12.12.12.12:80        39:17918               TIME_WAIT\n  TCP    12.12.12.12:80        40:51642               TIME_WAIT\n  TCP    12.12.12.12:80        40:57586               TIME_WAIT\n  TCP    12.12.12.12:80        45:45800               TIME_WAIT\n  TCP    12.12.12.12:139       exampleserver:0             LISTENING\n  TCP    12.12.12.12:443       static:3610            TIME_WAIT\n  TCP    12.12.12.12:443       static:5823            TIME_WAIT\n  TCP    12.12.12.12:443       static:38855           TIME_WAIT\n  TCP    12.12.12.12:443       static:53579           TIME_WAIT\n  TCP    12.12.12.12:443       static:54816           TIME_WAIT\n  TCP    12.12.12.12:443       static:26725           TIME_WAIT\n  TCP    12.12.12.12:443       static:14749           TIME_WAIT\n  TCP    12.12.12.12:443       static:8533            TIME_WAIT\n  TCP    12.12.12.12:443       static:9136            TIME_WAIT\n  TCP    12.12.12.12:443       static:35494           TIME_WAIT\n  TCP    12.12.12.12:443       193:48688              TIME_WAIT\n  TCP    12.12.12.12:443       static:3161            TIME_WAIT\n  TCP    12.12.12.12:443       static:31667           TIME_WAIT\n  TCP    12.12.12.12:443       ec2-52-31-126-154:25042  ESTABLISHED\n  TCP    12.12.12.12:443       ec2-52-31-157-215:61630  ESTABLISHED\n  TCP    12.12.12.12:443       ec2-52-31-205-57:20864  ESTABLISHED\n  TCP    12.12.12.12:443       crawl-66-249-76-28:46983  ESTABLISHED\n  TCP    12.12.12.12:443       crawl-66-249-76-30:47250  ESTABLISHED\n  TCP    12.12.12.12:443       crawl-66-249-76-102:45115  ESTABLISHED\n  TCP    12.12.12.12:443       crawl-66-249-76-104:62362  ESTABLISHED\n  TCP    12.12.12.12:443       crawl-66-249-76-106:52575  ESTABLISHED\n  TCP    12.12.12.12:443       crawl-66-249-76-192:51273  ESTABLISHED\n  TCP    12.12.12.12:443       google-proxy-66-249-81-16:37717  ESTABLISHED\n  TCP    12.12.12.12:443       rate-limited-proxy-66-249-89-97:42078  ESTABLISHED\n  TCP    12.12.12.12:443       77-162-6-126:60721     ESTABLISHED\n  TCP    12.12.12.12:443       77-162-6-126:60728     ESTABLISHED\n  TCP    12.12.12.12:443       81-207-120-215:53600   ESTABLISHED\n  TCP    12.12.12.12:443       ip-83-134-52-36:51127  ESTABLISHED\n  TCP    12.12.12.12:443       host-83-232-56-99:2747  ESTABLISHED\n  TCP    12.12.12.12:443       84-29-102-40:57144     ESTABLISHED\n  TCP    12.12.12.12:443       84-104-10-105:57252    ESTABLISHED\n  TCP    12.12.12.12:443       exampleserver:54209         ESTABLISHED\n  TCP    12.12.12.12:443       static:37222           TIME_WAIT\n  TCP    12.12.12.12:443       static:net-device      TIME_WAIT\n  TCP    12.12.12.12:443       static:7874            TIME_WAIT\n  TCP    12.12.12.12:443       static:33373           TIME_WAIT\n  TCP    12.12.12.12:443       static:60446           TIME_WAIT\n  TCP    12.12.12.12:443       92-111-50-210:54795    ESTABLISHED\n  TCP    12.12.12.12:443       static:2841            TIME_WAIT\n  TCP    12.12.12.12:443       ip-95-223-56-232:51129  ESTABLISHED\n  TCP    12.12.12.12:443       petalbot-114-119-135-120:32530  TIME_WAIT\n  TCP    12.12.12.12:443       petalbot-114-119-148-37:39746  TIME_WAIT\n  TCP    12.12.12.12:443       petalbot-114-119-148-47:39066  TIME_WAIT\n  TCP    12.12.12.12:443       petalbot-114-119-148-60:51178  SYN_RECEIVED\n  TCP    12.12.12.12:443       petalbot-114-119-148-160:11516  TIME_WAIT\n  TCP    12.12.12.12:443       petalbot-114-119-148-169:52484  TIME_WAIT\n  TCP    12.12.12.12:443       petalbot-114-119-148-191:41470  TIME_WAIT\n  TCP    12.12.12.12:443       petalbot-114-119-149-1:64570  TIME_WAIT\n  TCP    12.12.12.12:443       petalbot-114-119-149-168:1456  TIME_WAIT\n  TCP    12.12.12.12:443       petalbot-114-119-149-169:61436  TIME_WAIT\n  TCP    12.12.12.12:443       static:47402           TIME_WAIT\n  TCP    12.12.12.12:443       static:7710            TIME_WAIT\n  TCP    12.12.12.12:443       static:15334           TIME_WAIT\n  TCP    12.12.12.12:443       static:50492           TIME_WAIT\n  TCP    12.12.12.12:443       static:3896            TIME_WAIT\n  TCP    12.12.12.12:443       static:32136           TIME_WAIT\n  TCP    12.12.12.12:443       ninja-crawler97:19950  ESTABLISHED\n  TCP    12.12.12.12:443       static:9737            TIME_WAIT\n  TCP    12.12.12.12:443       1:14850                TIME_WAIT\n  TCP    12.12.12.12:443       2:9212                 TIME_WAIT\n  TCP    12.12.12.12:443       2:38644                TIME_WAIT\n  TCP    12.12.12.12:443       2:40354                TIME_WAIT\n  TCP    12.12.12.12:443       2:61144                TIME_WAIT\n  TCP    12.12.12.12:443       9:4920                 TIME_WAIT\n  TCP    12.12.12.12:443       9:10744                TIME_WAIT\n  TCP    12.12.12.12:443       9:41246                TIME_WAIT\n  TCP    12.12.12.12:443       10:55160               TIME_WAIT\n  TCP    12.12.12.12:443       12:28250               TIME_WAIT\n  TCP    12.12.12.12:443       12:48182               TIME_WAIT\n  TCP    12.12.12.12:443       13:6848                TIME_WAIT\n  TCP    12.12.12.12:443       13:41174               TIME_WAIT\n  TCP    12.12.12.12:443       14:11724               TIME_WAIT\n  TCP    12.12.12.12:443       14:23780               TIME_WAIT\n  TCP    12.12.12.12:443       14:35272               TIME_WAIT\n  TCP    12.12.12.12:443       14:42876               TIME_WAIT\n  TCP    12.12.12.12:443       15:50642               TIME_WAIT\n  TCP    12.12.12.12:443       16:11382               TIME_WAIT\n  TCP    12.12.12.12:443       16:43780               TIME_WAIT\n  TCP    12.12.12.12:443       17:18676               TIME_WAIT\n  TCP    12.12.12.12:443       18:40086               TIME_WAIT\n  TCP    12.12.12.12:443       20:14698               TIME_WAIT\n  TCP    12.12.12.12:443       21:8742                TIME_WAIT\n  TCP    12.12.12.12:443       21:9222                TIME_WAIT\n  TCP    12.12.12.12:443       21:10050               TIME_WAIT\n  TCP    12.12.12.12:443       21:22212               TIME_WAIT\n  TCP    12.12.12.12:443       23:20186               TIME_WAIT\n  TCP    12.12.12.12:443       24:9702                TIME_WAIT\n  TCP    12.12.12.12:443       24:29658               TIME_WAIT\n  TCP    12.12.12.12:443       24:54316               TIME_WAIT\n  TCP    12.12.12.12:443       24:54740               TIME_WAIT\n  TCP    12.12.12.12:443       26:63912               TIME_WAIT\n  TCP    12.12.12.12:443       34:38802               TIME_WAIT\n  TCP    12.12.12.12:443       34:48344               TIME_WAIT\n  TCP    12.12.12.12:443       35:19314               TIME_WAIT\n  TCP    12.12.12.12:443       35:56518               TIME_WAIT\n  TCP    12.12.12.12:443       36:26848               TIME_WAIT\n  TCP    12.12.12.12:443       36:29840               TIME_WAIT\n  TCP    12.12.12.12:443       37:22090               TIME_WAIT\n  TCP    12.12.12.12:443       37:41662               TIME_WAIT\n  TCP    12.12.12.12:443       37:62462               TIME_WAIT\n  TCP    12.12.12.12:443       37:65246               TIME_WAIT\n  TCP    12.12.12.12:443       38:3746                TIME_WAIT\n  TCP    12.12.12.12:443       38:13518               TIME_WAIT\n  TCP    12.12.12.12:443       38:19626               TIME_WAIT\n  TCP    12.12.12.12:443       38:46588               TIME_WAIT\n  TCP    12.12.12.12:443       38:55504               TIME_WAIT\n  TCP    12.12.12.12:443       39:13096               TIME_WAIT\n  TCP    12.12.12.12:443       40:14808               TIME_WAIT\n  TCP    12.12.12.12:443       40:18046               TIME_WAIT\n  TCP    12.12.12.12:443       40:19968               TIME_WAIT\n  TCP    12.12.12.12:443       40:37858               TIME_WAIT\n  TCP    12.12.12.12:443       40:47914               TIME_WAIT\n  TCP    12.12.12.12:443       40:54890               TIME_WAIT\n  TCP    12.12.12.12:443       40:58958               TIME_WAIT\n  TCP    12.12.12.12:443       40:61998               TIME_WAIT\n  TCP    12.12.12.12:443       41:5752                TIME_WAIT\n  TCP    12.12.12.12:443       41:6420                ESTABLISHED\n  TCP    12.12.12.12:443       41:6424                TIME_WAIT\n  TCP    12.12.12.12:443       41:8224                TIME_WAIT\n  TCP    12.12.12.12:443       41:23838               TIME_WAIT\n  TCP    12.12.12.12:443       41:56540               TIME_WAIT\n  TCP    12.12.12.12:443       42:44002               TIME_WAIT\n  TCP    12.12.12.12:443       42:48300               TIME_WAIT\n  TCP    12.12.12.12:443       45:16840               TIME_WAIT\n  TCP    12.12.12.12:443       45:44966               TIME_WAIT\n  TCP    12.12.12.12:443       45:45542               T

Thi*_*elo 0

您需要首先启动Splash 实例,并使其侦听端口 8050。例如:

docker run -dit -p 8050:8050 --name my_splash scrapinghub/splash
Run Code Online (Sandbox Code Playgroud)

然后,将启动 URL 设置为指向正在运行的容器:

设置.py:

SPLASH_URL = 'http://my_splash:8050/'
Run Code Online (Sandbox Code Playgroud)

最后,启动 Scrapy 容器,将其链接到 Splash 容器:

docker run -it --link my_splash --rm scrapy
Run Code Online (Sandbox Code Playgroud)

这样您就可以将 Scrapy 的请求发送到 Splash。

  • @RajVerma 显然**不是**防火墙:*没有 8050 监听端口*(`netstat` 是这么说的)。现在,*可能*需要打开防火墙才能允许访问 8050 端口,但如果该端口因无人监听而不存在,则允许访问不存在的对象将始终不会产生任何结果。 (2认同)