Flo*_*Flo 6 python scrapy docker scrapy-splash windows-server-2019
我的步骤:
\ndocker build . -t scrapy
docker run -it -p 8050:8050 --rm scrapy
scrapy crawl foobar -o allobjects.json
这在本地有效,但在我的生产服务器上我收到错误:
\n\n\n[scrapy.downloadermiddlewares.retry] DEBUG:重试 <GET https://www.example.com via http://localhost:8050/execute> (失败 1 次):连接被另一方拒绝:10061:无法连接之所以被制作,是因为目标机器主动拒绝了它。
\n
注意:我没有使用 Docker Desktop,也不能在此服务器上使用。
\nDockerfile
\nFROM mcr.microsoft.com/windows/servercore:ltsc2019\n\nSHELL ["powershell", "-Command", "$ErrorActionPreference = \'Stop\'; $ProgressPreference = \'SilentlyContinue\';"]\n\nRUN setx /M PATH $(\'C:\\Users\\ContainerAdministrator\\miniconda3\\Library\\bin;C:\\Users\\ContainerAdministrator\\miniconda3\\Scripts;C:\\Users\\ContainerAdministrator\\miniconda3;\' + $Env:PATH)\nRUN Invoke-WebRequest "https://repo.anaconda.com/miniconda/Miniconda3-py38_4.10.3-Windows-x86_64.exe" -OutFile miniconda3.exe -UseBasicParsing; \\\n Start-Process -FilePath \'miniconda3.exe\' -Wait -ArgumentList \'/S\', \'/D=C:\\Users\\ContainerAdministrator\\miniconda3\'; \\\n Remove-Item .\\miniconda3.exe; \\\n conda install -y -c conda-forge scrapy;\n\nRUN pip install scrapy-splash\nRUN pip install scrapy-user-agents\n \n#creates root directory if not exists, then enters it\nWORKDIR /root/scrapy\n\nCOPY scrapy /root/scrapy\n
Run Code Online (Sandbox Code Playgroud)\n设置.py
\nSPLASH_URL = \'http://localhost:8050/\'\n
Run Code Online (Sandbox Code Playgroud)\n带命令输出scrapy crawl foobar -o allobjects.json
2021-09-15 20:12:16 [scrapy.core.engine] INFO: Spider opened\n2021-09-15 20:12:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min\n)\n2021-09-15 20:12:16 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023\n2021-09-15 20:12:16 [py.warnings] WARNING: C:\\Users\\ContainerAdministrator\\miniconda3\\lib\\site-packages\\scrapy_splash\\re\nquest.py:41: ScrapyDeprecationWarning: Call to deprecated function to_native_str. Use to_unicode instead.\n url = to_native_str(url)\n\n2021-09-15 20:12:16 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36\n2021-09-15 20:12:16 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36\n2021-09-15 20:12:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.example.com via http://localhost:8050/execute> (failed 1 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..\n2021-09-15 20:12:17 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) App\nleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36\n2021-09-15 20:12:18 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.example.com via http://localhost:8050/execute> (failed 2 times): Connection was refused by other side: 10061: No connection\ncould be made because the target machine actively refused it..\n2021-09-15 20:12:18 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64\n) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36\n2021-09-15 20:12:19 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.example.com via http://localhost:8050/execute> (failed 3 times): Connection was refused by other side: 10061: No con\nnection could be made because the target machine actively refused it..\n2021-09-15 20:12:19 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.example.com via http://localhost:8050/execute>\nTraceback (most recent call last):\n File "C:\\Users\\ContainerAdministrator\\miniconda3\\lib\\site-packages\\scrapy\\core\\downloader\\middleware.py", line 45, in\nprocess_request\n return (yield download_func(request=request, spider=spider))\ntwisted.internet.error.ConnectionRefusedError: Connection was refused by other side: 10061: No connection could be made\nbecause the target machine actively refused it..\n2021-09-15 20:12:19 [scrapy.core.engine] INFO: Closing spider (finished)\n2021-09-15 20:12:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats:\n{\'downloader/exception_count\': 3,\n \'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError\': 3,\n \'downloader/request_bytes\': 4632,\n \'downloader/request_count\': 3,\n \'downloader/request_method_count/POST\': 3,\n \'elapsed_time_seconds\': 3.310168,\n \'finish_reason\': \'finished\',\n \'finish_time\': datetime.datetime(2021, 9, 15, 18, 12, 19, 605641),\n \'log_count/DEBUG\': 6,\n \'log_count/ERROR\': 2,\n \'log_count/INFO\': 10,\n \'log_count/WARNING\': 46,\n \'retry/count\': 2,\n \'retry/max_reached\': 1,\n \'retry/reason_count/twisted.internet.error.ConnectionRefusedError\': 2,\n \'scheduler/dequeued\': 4,\n \'scheduler/dequeued/memory\': 4,\n \'scheduler/enqueued\': 4,\n \'scheduler/enqueued/memory\': 4,\n \'splash/execute/request_count\': 1,\n \'start_time\': datetime.datetime(2021, 9, 15, 18, 12, 16, 295473)}\n2021-09-15 20:12:19 [scrapy.core.engine] INFO: Spider closed (finished)\n
Run Code Online (Sandbox Code Playgroud)\n我缺少什么?
\n我已经在这里检查过:
\n\n更新1
\n我包含EXPOSE 8050
在我的 Dockerfile 中,但得到了同样的错误。我netstat -a
在docker容器内尝试过,但8050似乎不在那里?
C:\\root\\scrapy>netstat -a
Active Connections\n\n Proto Local Address Foreign Address State\n TCP 0.0.0.0:135 c60d48724046:0 LISTENING\n TCP 0.0.0.0:5985 c60d48724046:0 LISTENING\n TCP 0.0.0.0:47001 c60d48724046:0 LISTENING\n TCP 0.0.0.0:49152 c60d48724046:0 LISTENING\n TCP 0.0.0.0:49153 c60d48724046:0 LISTENING\n TCP 0.0.0.0:49154 c60d48724046:0 LISTENING\n TCP 0.0.0.0:49155 c60d48724046:0 LISTENING\n TCP 0.0.0.0:49159 c60d48724046:0 LISTENING\n TCP [::]:135 c60d48724046:0 LISTENING\n TCP [::]:5985 c60d48724046:0 LISTENING\n TCP [::]:47001 c60d48724046:0 LISTENING\n TCP [::]:49152 c60d48724046:0 LISTENING\n TCP [::]:49153 c60d48724046:0 LISTENING\n TCP [::]:49154 c60d48724046:0 LISTENING\n TCP [::]:49155 c60d48724046:0 LISTENING\n TCP [::]:49159 c60d48724046:0 LISTENING\n UDP 0.0.0.0:5353 *:*\n UDP 0.0.0.0:5355 *:*\n UDP 127.0.0.1:51352 *:*\n UDP [::]:5353 *:*\n UDP [::]:5355 *:*\n
Run Code Online (Sandbox Code Playgroud)\n更新2
\n我在主机操作系统上运行的命令:
\ndocker ps
输出:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\nbf615a00b74a scrapy "c:\\\\windows\\\\system32\xe2\x80\xa6" 52 seconds ago Up 49 seconds 0.0.0.0:8050->8050/tcp blissful_brahmagupta\n
Run Code Online (Sandbox Code Playgroud)\nnetstat -a
输出(为了匿名,我更改了 ip/服务器名称):
Active Connections\n\n Proto Local Address Foreign Address State\n TCP 0.0.0.0:21 exampleserver:0 LISTENING\n TCP 0.0.0.0:25 exampleserver:0 LISTENING\n TCP 0.0.0.0:80 exampleserver:0 LISTENING\n TCP 0.0.0.0:110 exampleserver:0 LISTENING\n TCP 0.0.0.0:135 exampleserver:0 LISTENING\n TCP 0.0.0.0:143 exampleserver:0 LISTENING\n TCP 0.0.0.0:443 exampleserver:0 LISTENING\n TCP 0.0.0.0:445 exampleserver:0 LISTENING\n TCP 0.0.0.0:587 exampleserver:0 LISTENING\n TCP 0.0.0.0:995 exampleserver:0 LISTENING\n TCP 0.0.0.0:1433 exampleserver:0 LISTENING\n TCP 0.0.0.0:2179 exampleserver:0 LISTENING\n TCP 0.0.0.0:3306 exampleserver:0 LISTENING\n TCP 0.0.0.0:3389 exampleserver:0 LISTENING\n TCP 0.0.0.0:5985 exampleserver:0 LISTENING\n TCP 0.0.0.0:8983 exampleserver:0 LISTENING\n TCP 0.0.0.0:33060 exampleserver:0 LISTENING\n TCP 0.0.0.0:47001 exampleserver:0 LISTENING\n TCP 0.0.0.0:49231 exampleserver:0 LISTENING\n TCP 0.0.0.0:49664 exampleserver:0 LISTENING\n TCP 0.0.0.0:49665 exampleserver:0 LISTENING\n TCP 0.0.0.0:49666 exampleserver:0 LISTENING\n TCP 0.0.0.0:49667 exampleserver:0 LISTENING\n TCP 0.0.0.0:49668 exampleserver:0 LISTENING\n TCP 0.0.0.0:49673 exampleserver:0 LISTENING\n TCP 0.0.0.0:49881 exampleserver:0 LISTENING\n TCP 12.12.12.12:21 103.144.31.100:ftp SYN_RECEIVED\n TCP 12.12.12.12:25 ip245:1256 TIME_WAIT\n TCP 12.12.12.12:25 ip245:12756 TIME_WAIT\n TCP 12.12.12.12:25 ip245:25324 TIME_WAIT\n TCP 12.12.12.12:25 ip245:30624 TIME_WAIT\n TCP 12.12.12.12:25 ip245:48206 TIME_WAIT\n TCP 12.12.12.12:25 ip245:59510 TIME_WAIT\n TCP 12.12.12.12:80 ec2-52-31-126-154:1440 ESTABLISHED\n TCP 12.12.12.12:80 ec2-52-31-157-215:31240 ESTABLISHED\n TCP 12.12.12.12:80 ec2-52-31-205-57:65197 ESTABLISHED\n TCP 12.12.12.12:80 ninja-crawler92:36060 ESTABLISHED\n TCP 12.12.12.12:80 13:62786 TIME_WAIT\n TCP 12.12.12.12:80 16:22362 TIME_WAIT\n TCP 12.12.12.12:80 19:4130 TIME_WAIT\n TCP 12.12.12.12:80 22:30072 TIME_WAIT\n TCP 12.12.12.12:80 22:51362 TIME_WAIT\n TCP 12.12.12.12:80 34:9586 TIME_WAIT\n TCP 12.12.12.12:80 35:40210 TIME_WAIT\n TCP 12.12.12.12:80 35:65164 TIME_WAIT\n TCP 12.12.12.12:80 38:17882 TIME_WAIT\n TCP 12.12.12.12:80 39:17918 TIME_WAIT\n TCP 12.12.12.12:80 40:51642 TIME_WAIT\n TCP 12.12.12.12:80 40:57586 TIME_WAIT\n TCP 12.12.12.12:80 45:45800 TIME_WAIT\n TCP 12.12.12.12:139 exampleserver:0 LISTENING\n TCP 12.12.12.12:443 static:3610 TIME_WAIT\n TCP 12.12.12.12:443 static:5823 TIME_WAIT\n TCP 12.12.12.12:443 static:38855 TIME_WAIT\n TCP 12.12.12.12:443 static:53579 TIME_WAIT\n TCP 12.12.12.12:443 static:54816 TIME_WAIT\n TCP 12.12.12.12:443 static:26725 TIME_WAIT\n TCP 12.12.12.12:443 static:14749 TIME_WAIT\n TCP 12.12.12.12:443 static:8533 TIME_WAIT\n TCP 12.12.12.12:443 static:9136 TIME_WAIT\n TCP 12.12.12.12:443 static:35494 TIME_WAIT\n TCP 12.12.12.12:443 193:48688 TIME_WAIT\n TCP 12.12.12.12:443 static:3161 TIME_WAIT\n TCP 12.12.12.12:443 static:31667 TIME_WAIT\n TCP 12.12.12.12:443 ec2-52-31-126-154:25042 ESTABLISHED\n TCP 12.12.12.12:443 ec2-52-31-157-215:61630 ESTABLISHED\n TCP 12.12.12.12:443 ec2-52-31-205-57:20864 ESTABLISHED\n TCP 12.12.12.12:443 crawl-66-249-76-28:46983 ESTABLISHED\n TCP 12.12.12.12:443 crawl-66-249-76-30:47250 ESTABLISHED\n TCP 12.12.12.12:443 crawl-66-249-76-102:45115 ESTABLISHED\n TCP 12.12.12.12:443 crawl-66-249-76-104:62362 ESTABLISHED\n TCP 12.12.12.12:443 crawl-66-249-76-106:52575 ESTABLISHED\n TCP 12.12.12.12:443 crawl-66-249-76-192:51273 ESTABLISHED\n TCP 12.12.12.12:443 google-proxy-66-249-81-16:37717 ESTABLISHED\n TCP 12.12.12.12:443 rate-limited-proxy-66-249-89-97:42078 ESTABLISHED\n TCP 12.12.12.12:443 77-162-6-126:60721 ESTABLISHED\n TCP 12.12.12.12:443 77-162-6-126:60728 ESTABLISHED\n TCP 12.12.12.12:443 81-207-120-215:53600 ESTABLISHED\n TCP 12.12.12.12:443 ip-83-134-52-36:51127 ESTABLISHED\n TCP 12.12.12.12:443 host-83-232-56-99:2747 ESTABLISHED\n TCP 12.12.12.12:443 84-29-102-40:57144 ESTABLISHED\n TCP 12.12.12.12:443 84-104-10-105:57252 ESTABLISHED\n TCP 12.12.12.12:443 exampleserver:54209 ESTABLISHED\n TCP 12.12.12.12:443 static:37222 TIME_WAIT\n TCP 12.12.12.12:443 static:net-device TIME_WAIT\n TCP 12.12.12.12:443 static:7874 TIME_WAIT\n TCP 12.12.12.12:443 static:33373 TIME_WAIT\n TCP 12.12.12.12:443 static:60446 TIME_WAIT\n TCP 12.12.12.12:443 92-111-50-210:54795 ESTABLISHED\n TCP 12.12.12.12:443 static:2841 TIME_WAIT\n TCP 12.12.12.12:443 ip-95-223-56-232:51129 ESTABLISHED\n TCP 12.12.12.12:443 petalbot-114-119-135-120:32530 TIME_WAIT\n TCP 12.12.12.12:443 petalbot-114-119-148-37:39746 TIME_WAIT\n TCP 12.12.12.12:443 petalbot-114-119-148-47:39066 TIME_WAIT\n TCP 12.12.12.12:443 petalbot-114-119-148-60:51178 SYN_RECEIVED\n TCP 12.12.12.12:443 petalbot-114-119-148-160:11516 TIME_WAIT\n TCP 12.12.12.12:443 petalbot-114-119-148-169:52484 TIME_WAIT\n TCP 12.12.12.12:443 petalbot-114-119-148-191:41470 TIME_WAIT\n TCP 12.12.12.12:443 petalbot-114-119-149-1:64570 TIME_WAIT\n TCP 12.12.12.12:443 petalbot-114-119-149-168:1456 TIME_WAIT\n TCP 12.12.12.12:443 petalbot-114-119-149-169:61436 TIME_WAIT\n TCP 12.12.12.12:443 static:47402 TIME_WAIT\n TCP 12.12.12.12:443 static:7710 TIME_WAIT\n TCP 12.12.12.12:443 static:15334 TIME_WAIT\n TCP 12.12.12.12:443 static:50492 TIME_WAIT\n TCP 12.12.12.12:443 static:3896 TIME_WAIT\n TCP 12.12.12.12:443 static:32136 TIME_WAIT\n TCP 12.12.12.12:443 ninja-crawler97:19950 ESTABLISHED\n TCP 12.12.12.12:443 static:9737 TIME_WAIT\n TCP 12.12.12.12:443 1:14850 TIME_WAIT\n TCP 12.12.12.12:443 2:9212 TIME_WAIT\n TCP 12.12.12.12:443 2:38644 TIME_WAIT\n TCP 12.12.12.12:443 2:40354 TIME_WAIT\n TCP 12.12.12.12:443 2:61144 TIME_WAIT\n TCP 12.12.12.12:443 9:4920 TIME_WAIT\n TCP 12.12.12.12:443 9:10744 TIME_WAIT\n TCP 12.12.12.12:443 9:41246 TIME_WAIT\n TCP 12.12.12.12:443 10:55160 TIME_WAIT\n TCP 12.12.12.12:443 12:28250 TIME_WAIT\n TCP 12.12.12.12:443 12:48182 TIME_WAIT\n TCP 12.12.12.12:443 13:6848 TIME_WAIT\n TCP 12.12.12.12:443 13:41174 TIME_WAIT\n TCP 12.12.12.12:443 14:11724 TIME_WAIT\n TCP 12.12.12.12:443 14:23780 TIME_WAIT\n TCP 12.12.12.12:443 14:35272 TIME_WAIT\n TCP 12.12.12.12:443 14:42876 TIME_WAIT\n TCP 12.12.12.12:443 15:50642 TIME_WAIT\n TCP 12.12.12.12:443 16:11382 TIME_WAIT\n TCP 12.12.12.12:443 16:43780 TIME_WAIT\n TCP 12.12.12.12:443 17:18676 TIME_WAIT\n TCP 12.12.12.12:443 18:40086 TIME_WAIT\n TCP 12.12.12.12:443 20:14698 TIME_WAIT\n TCP 12.12.12.12:443 21:8742 TIME_WAIT\n TCP 12.12.12.12:443 21:9222 TIME_WAIT\n TCP 12.12.12.12:443 21:10050 TIME_WAIT\n TCP 12.12.12.12:443 21:22212 TIME_WAIT\n TCP 12.12.12.12:443 23:20186 TIME_WAIT\n TCP 12.12.12.12:443 24:9702 TIME_WAIT\n TCP 12.12.12.12:443 24:29658 TIME_WAIT\n TCP 12.12.12.12:443 24:54316 TIME_WAIT\n TCP 12.12.12.12:443 24:54740 TIME_WAIT\n TCP 12.12.12.12:443 26:63912 TIME_WAIT\n TCP 12.12.12.12:443 34:38802 TIME_WAIT\n TCP 12.12.12.12:443 34:48344 TIME_WAIT\n TCP 12.12.12.12:443 35:19314 TIME_WAIT\n TCP 12.12.12.12:443 35:56518 TIME_WAIT\n TCP 12.12.12.12:443 36:26848 TIME_WAIT\n TCP 12.12.12.12:443 36:29840 TIME_WAIT\n TCP 12.12.12.12:443 37:22090 TIME_WAIT\n TCP 12.12.12.12:443 37:41662 TIME_WAIT\n TCP 12.12.12.12:443 37:62462 TIME_WAIT\n TCP 12.12.12.12:443 37:65246 TIME_WAIT\n TCP 12.12.12.12:443 38:3746 TIME_WAIT\n TCP 12.12.12.12:443 38:13518 TIME_WAIT\n TCP 12.12.12.12:443 38:19626 TIME_WAIT\n TCP 12.12.12.12:443 38:46588 TIME_WAIT\n TCP 12.12.12.12:443 38:55504 TIME_WAIT\n TCP 12.12.12.12:443 39:13096 TIME_WAIT\n TCP 12.12.12.12:443 40:14808 TIME_WAIT\n TCP 12.12.12.12:443 40:18046 TIME_WAIT\n TCP 12.12.12.12:443 40:19968 TIME_WAIT\n TCP 12.12.12.12:443 40:37858 TIME_WAIT\n TCP 12.12.12.12:443 40:47914 TIME_WAIT\n TCP 12.12.12.12:443 40:54890 TIME_WAIT\n TCP 12.12.12.12:443 40:58958 TIME_WAIT\n TCP 12.12.12.12:443 40:61998 TIME_WAIT\n TCP 12.12.12.12:443 41:5752 TIME_WAIT\n TCP 12.12.12.12:443 41:6420 ESTABLISHED\n TCP 12.12.12.12:443 41:6424 TIME_WAIT\n TCP 12.12.12.12:443 41:8224 TIME_WAIT\n TCP 12.12.12.12:443 41:23838 TIME_WAIT\n TCP 12.12.12.12:443 41:56540 TIME_WAIT\n TCP 12.12.12.12:443 42:44002 TIME_WAIT\n TCP 12.12.12.12:443 42:48300 TIME_WAIT\n TCP 12.12.12.12:443 45:16840 TIME_WAIT\n TCP 12.12.12.12:443 45:44966 TIME_WAIT\n TCP 12.12.12.12:443 45:45542 T
您需要首先启动Splash 实例,并使其侦听端口 8050。例如:
docker run -dit -p 8050:8050 --name my_splash scrapinghub/splash
Run Code Online (Sandbox Code Playgroud)
然后,将启动 URL 设置为指向正在运行的容器:
设置.py:
SPLASH_URL = 'http://my_splash:8050/'
Run Code Online (Sandbox Code Playgroud)
最后,启动 Scrapy 容器,将其链接到 Splash 容器:
docker run -it --link my_splash --rm scrapy
Run Code Online (Sandbox Code Playgroud)
这样您就可以将 Scrapy 的请求发送到 Splash。
归档时间: |
|
查看次数: |
1830 次 |
最近记录: |