我正在尝试编写一个脚本,检查是否存在多个网址:
import httplib
with open('urls.txt') as urls:
    for url in urls:
        connection = httplib.HTTPConnection(url)
        connection.request("GET")
        response = connection.getresponse()
        if response.status == 200:
            print '[{}]: '.format(url), "Up!"
Run Code Online (Sandbox Code Playgroud)
但我得到了这个错误:
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    connection = httplib.HTTPConnection(url)
  File "/usr/lib/python2.7/httplib.py", line 693, in __init__
    self._set_hostport(host, port)
  File "/usr/lib/python2.7/httplib.py", line 721, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: '//globo.com/galeria/amazonas/a.html
Run Code Online (Sandbox Code Playgroud)
怎么了?
Atu*_*ind 19
这可能是一个简单的解决方案
connection = httplib.HTTPConnection(url)
Run Code Online (Sandbox Code Playgroud)
你正在使用httpconnection所以没有必要给网址,http://iGyan.org,但你需要给iGyan.org.
简而言之http://,https://从URL中删除和,因为它httplib正在考虑:作为端口号,端口号必须是数字,
希望这可以帮助!
httplib.HttpConnection取host并port在其构造远程URL的,而不是整个URL.
对于您的用例,它更容易使用urllib2.urlopen.
import urllib2
with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 
Run Code Online (Sandbox Code Playgroud)
        |   归档时间:  |  
           
  |  
        
|   查看次数:  |  
           26193 次  |  
        
|   最近记录:  |