yieldPython中关键字的用途是什么?它有什么作用?
例如,我试图理解这段代码1:
def _get_child_candidates(self, distance, min_dist, max_dist):
if self._leftchild and distance - max_dist < self._median:
yield self._leftchild
if self._rightchild and distance + max_dist >= self._median:
yield self._rightchild
Run Code Online (Sandbox Code Playgroud)
这是来电者:
result, candidates = [], [self]
while candidates:
node = candidates.pop()
distance = node._get_dist(obj)
if distance <= max_dist and distance >= min_dist:
result.extend(node._values)
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result
Run Code Online (Sandbox Code Playgroud)
_get_child_candidates调用该方法时会发生什么?列表是否返回?单个元素?它又被召唤了吗?后续通话何时停止?
1.代码来自Jochen Schulz(jrschulz),他为度量空间创建了一个很棒的Python库.这是完整源代码的链接:模块mspace.
所以,我的问题相对简单.我有一个蜘蛛爬行多个站点,我需要它按照我在代码中写入的顺序返回数据.它发布在下面.
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from mlbodds.items import MlboddsItem
class MLBoddsSpider(BaseSpider):
name = "sbrforum.com"
allowed_domains = ["sbrforum.com"]
start_urls = [
"http://www.sbrforum.com/mlb-baseball/odds-scores/20110328/",
"http://www.sbrforum.com/mlb-baseball/odds-scores/20110329/",
"http://www.sbrforum.com/mlb-baseball/odds-scores/20110330/"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@id="col_3"]//div[@id="module3_1"]//div[@id="moduleData4952"]')
items = []
for site in sites:
item = MlboddsItem()
item['header'] = site.select('//div[@class="scoreboard-bar"]//h2//span[position()>1]//text()').extract()# | /*//table[position()<2]//tr//th[@colspan="2"]//text()').extract()
item['game1'] = site.select('/*//table[position()=1]//tr//td[@class="tbl-odds-c2"]//text() | /*//table[position()=1]//tr//td[@class="tbl-odds-c4"]//text() | /*//table[position()=1]//tr//td[@class="tbl-odds-c6"]//text()').extract()
items.append(item)
return items
Run Code Online (Sandbox Code Playgroud)
结果以随机顺序返回,例如它返回第29个,然后是第28个,然后是第30个.我已经尝试将调度程序顺序从DFO更改为BFO,以防万一是问题,但这并没有改变任何东西.
这就是我的蜘蛛的设置方式
class CustomSpider(CrawlSpider):
name = 'custombot'
allowed_domains = ['www.domain.com']
start_urls = ['http://www.domain.com/some-url']
rules = (
Rule(SgmlLinkExtractor(allow=r'.*?something/'), callback='do_stuff', follow=True),
)
def start_requests(self):
return Request('http://www.domain.com/some-other-url', callback=self.do_something_else)
Run Code Online (Sandbox Code Playgroud)
它转到/ some-other-url但不是/ some-url.这有什么不对?start_urls中指定的url是需要通过规则过滤器提取和发送的链接的url,其中start_requests中的url直接发送到项目解析器,因此不需要通过规则过滤器.