对于学校我应该写一个提取IP地址的Python RE脚本.我正在使用的正则表达式似乎可以使用re.search()但不能使用re.findall().
exp = "(\d{1,3}\.){3}\d{1,3}"
ip = "blah blah 192.168.0.185 blah blah"
match = re.search(exp, ip)
print match.group()
Run Code Online (Sandbox Code Playgroud)
对此的匹配总是192.168.0.185,但它与我的不同 re.findall()
exp = "(\d{1,3}\.){3}\d{1,3}"
ip = "blah blah 192.168.0.185 blah blah"
matches = re.findall(exp, ip)
print matches[0]
0.
Run Code Online (Sandbox Code Playgroud)
我想知道为什么re.findall()收益率为0.当re.search()收益率为192.168.0.185时,因为我对两个函数使用相同的表达式.
我能做些什么来实现它re.findall()才能真正遵循正确的表达方式?还是我犯了某种错误?
我还在尝试使用Scrapy,我正在尝试抓取本地网络上的网站.该网站的IP地址是192.168.0.185.这是我的蜘蛛:
from scrapy.spider import BaseSpider
class 192.168.0.185_Spider(BaseSpider):
name = "192.168.0.185"
allowed_domains = ["192.168.0.185"]
start_urls = ["http://192.168.0.185/"]
def parse(self, response):
print "Test:", response.headers
Run Code Online (Sandbox Code Playgroud)
然后在与我的蜘蛛相同的目录中,我将执行此shell命令来运行蜘蛛:
scrapy crawl 192.168.0.185
Run Code Online (Sandbox Code Playgroud)
我得到一个非常难看,不可读的错误消息:
2012-02-10 20:55:18-0600 [scrapy] INFO: Scrapy 0.14.0 started (bot: tutorial)
2012-02-10 20:55:18-0600 [scrapy] DEBUG: Enabled extensions: LogStats,
TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
2012-02-10 20:55:18-0600 [scrapy] DEBUG: Enabled downloader middlewares:
HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware,
DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware,
HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-02-10 20:55:18-0600 [scrapy] DEBUG: Enabled spider middlewares:
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware,
DepthMiddleware 2012-02-10 20:55:18-0600 …Run Code Online (Sandbox Code Playgroud)