Phe*_*ndy 9 python ip proxy json python-requests
这是由高级代理提供商luminati.io提供的API.但是,它作为字节代码而不是字典返回,因此它被转换为dictonary,以便能够提取ip和 port:
每个请求都将以新的对等代理结束,因为IP会针对每个请求进行轮换.
import csv
import requests
import json
import time
#!/usr/bin/env python
print('If you get error "ImportError: No module named \'six\'"'+\
'install six:\n$ sudo pip install six');
import sys
if sys.version_info[0]==2:
import six
from six.moves.urllib import request
opener = request.build_opener(
request.ProxyHandler(
{'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'}))
proxy_details = opener.open('http://lumtest.com/myip.json').read()
if sys.version_info[0]==3:
import urllib.request
opener = urllib.request.build_opener(
urllib.request.ProxyHandler(
{'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'}))
proxy_details = opener.open('http://lumtest.com/myip.json').read()
proxy_dictionary = json.loads(proxy_details)
print(proxy_dictionary)
Run Code Online (Sandbox Code Playgroud)
然后我计划使用ip和port在请求模块中连接到感兴趣的网站:
headers = {'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'}
if __name__ == "__main__":
search_keyword = input("Enter the search keyword: ")
page_number = int(input("Enter total number of pages: "))
for i in range(1,page_number+1):
time.sleep(10)
link = 'https://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])
print(proxy)
req = requests.get(link,headers=headers,proxies={"https":proxy})
Run Code Online (Sandbox Code Playgroud)
但我的问题是它在该requests部分出错.当我proxies={"https":proxy}改为proxies={"http":proxy}有一次它通过,但除此之外,代理无法连接.
样本输出:
print_dictionary = {'ip': '84.22.151.191', 'country': 'RU', 'asn': {'asnum': 57129, 'org_name': 'Optibit LLC'}, 'geo': {'city': 'Krasnoyarsk', 'region': 'KYA', 'postal_code': '660000', 'latitude': 56.0097, 'longitude': 92.7917, 'tz': 'Asia/Krasnoyarsk'}}
Run Code Online (Sandbox Code Playgroud)
print(proxy)将产生84.22.151.191:57129哪种进料到该requests.get方法中
我得到的错误:
(Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000282DDD592B0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',)))
Run Code Online (Sandbox Code Playgroud)
我测试了删除方法的proxies={"https":proxy}参数,requests并且抓取工作没有错误.因此代理有一个问题或我访问它的方式.
proxies={"https":proxy}更改为时,proxies={"http":proxy}您还必须确保您的链接是http,而不是https也尝试替换:
link = 'https://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
Run Code Online (Sandbox Code Playgroud)
和
link = 'http://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
Run Code Online (Sandbox Code Playgroud)
您的整体代码应如下所示:
headers = {'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'}
if __name__ == "__main__":
search_keyword = input("Enter the search keyword: ")
page_number = int(input("Enter total number of pages: "))
for i in range(1,page_number+1):
time.sleep(10)
link = 'http://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])
print(proxy)
req = requests.get(link,headers=headers,proxies={"http":proxy})
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助!
| 归档时间: |
|
| 查看次数: |
1582 次 |
| 最近记录: |