小编Sid*_*rth的帖子

Scrapy没有抓取所有页面

我试图以非常基本的方式抓取网站.但Scrapy没有抓取所有链接.我会解释如下情况─

main_page.html - >包含指向a_page.html,b_page.html,c_page.html的
链接a_page.html - >包含指向a1_page.html的链接,a2_page.html
b_page.html - >包含指向b1_page.html,b2_page.html
c_page的链接.html - >包含指向c1_page.html的链接,c2_page.html
a1_page.html - >包含指向b_page.html的链接
a2_page.html - >包含指向c_page.html的链接
b1_page.html - >包含指向a_page.html的链接
b2_page.html - >包含指向c_page.html的链接
c1_page.html - >包含指向a_page.html的链接
c2_page.html - >包含指向main_page.html的链接

我在CrawlSpider中使用以下规则 -

Rule(SgmlLinkExtractor(allow = ()), callback = 'parse_item', follow = True))

但抓取结果如下 -

DEBUG:Crawled(200)http://localhost/main_page.html>(referer:None)2011-12-05 09:56:07 + 0530 [test_spider] DEBUG:Crawled(200)http:// localhost/a_page. html>(referer: http://localhost/main_page.html )2011-12-05 09:56:07 + 0530 [test_spider] DEBUG:Crawled(200)http://localhost/a1_page.html>(referer:http ://localhost/a_page.html )2011-12-05 09:56:07 + 0530 [test_spider] DEBUG:Crawled(200)http://localhost/b_page.html>(referer:http:// localhost/a1_page .html)2011-12-05 09:56:07 + 0530 [test_spider] DEBUG:Crawled(200)http://localhost/b1_page.html>(referer:http://localhost/b_page.html)2011-12 -05 09:56:07 …

python scrapy

Sid*_*rth

2016 09-20

4
推荐指数

1
解决办法

6909
查看次数

使用特定版本的 Python 设置 python virtualenv

我正在尝试开始使用 Google App Engine。我在我的虚拟环境中安装了 python 2.6，我想使用它。但Google App Engine支持python2.5。所以我想用python 2.5设置另一个python虚拟环境。

你能帮我具体怎么做吗？

python virtualenv

Sid*_*rth

2012 03-12

3
推荐指数

1
解决办法

3587
查看次数

正确使用CGAffineTransformMakeScale

我有一个UIButton使用故事板的布局.该按钮只包含一个图像.单击该按钮时,我想设置按钮大小的动画 - 减小尺寸,然后再将其恢复为原始大小.

我用了以下代码 -

[UIView animateWithDuration:2.0 animations:^{
    _favButton.transform = CGAffineTransformMakeScale(0.5, 0.5);
}completion:^(BOOL finished) {
    [UIView animateWithDuration:2.0 animations:^{
          _favButton.transform = CGAffineTransformMakeScale(1, 1);
    }];
}];

Run Code Online (Sandbox Code Playgroud)

这段代码移动了我不想要的屏幕上的按钮.我希望center修复按钮并调整大小.

我没有Top Constraint在故事板中使用任何按钮.我该如何纠正这种行为？

core-animation objective-c cgaffinetransform uiviewanimation ios

Sid*_*rth

2016 04-21

3
推荐指数

1
解决办法

9924
查看次数

Alembic不会使复合主键

我有一个SQLAlchemy模型，例如-

class UserFavPlace(db.Model):
    # This model stores the feedback from the user whether he has
    # faved a place or not
    __tablename__ = u'user_fav_places'

    id = db.Column(db.Integer, primary_key = True)
    public_place_id = db.Column(db.Integer, db.ForeignKey(u'public_places.id'))
    user_id = db.Column(db.Integer, db.ForeignKey(u'users.user_id'))
    fav = db.Column(db.Boolean)
    updated_time = db.Column(db.DateTime)

    place = relationship(u'PublicPlace', backref = u'user_fav_places')
    user = relationship(u'User', backref = u'user_fav_places')

Run Code Online (Sandbox Code Playgroud)

然后，我将此模型更改为以下模型-

class UserFavPlace(db.Model):
    # This model stores the feedback from the user whether he has
    # faved a place or not
    __tablename__ = u'user_fav_places' …

Run Code Online (Sandbox Code Playgroud)

sqlalchemy alembic

Sid*_*rth

lucky-day

3
推荐指数

1
解决办法

2348
查看次数

在Scrapy中爬行的顺序

我在scrapy中写了一个基本的CrawlSpider,但我想了解网址被抓取的顺序是什么 - FIFO/LIFO？

我希望抓取工具抓取起始网址页面中的所有链接,然后转到其他似乎不是订单的网址.

我怎样才能做到这一点？

python scrapy

Sid*_*rth

lucky-day

2
推荐指数

1
解决办法

2348
查看次数

拉入mongoengine

我有一个ListField(DictField)包含像 -

{'user_id': '12345', 'timestamp' : 'datetime-object'}

Run Code Online (Sandbox Code Playgroud)

在mongoengine中,如何从user_id上查询的List中删除元素.例如,我想删除具有特定user_id的条目.我试过以下 -

update_one(pull__notes__user_id = '12345')

Run Code Online (Sandbox Code Playgroud)

这notes是集合的名称.

此语句返回1但不会从List中删除该元素.我怎样才能做到这一点？

mongodb mongoengine

Sid*_*rth

2012 01-31

2
推荐指数

1
解决办法

2469
查看次数

如何使用coffeescript在Express中创建app全局

我正在尝试在CoffeeScript中构建一个Express项目.我试图使app变量全局化,以便我可以在任何地方使用 - 从中读取配置设置.

到目前为止,我试过这个 -

在我的app.coffee文件中 -

app = express()
app.configure ->
    app.set 'host', 'localhost'
http.createServer(app).listen 8888, ->
    console.log 'Server started'
exports.app = app

Run Code Online (Sandbox Code Playgroud)

我想host在我的一个路由文件中访问上面的变量集.所以,我尝试了我的路由处理程序,

exports.app.get('host') # I get this undefined

Run Code Online (Sandbox Code Playgroud)

怎么做到这一点？我必须require(app)在我的路线文件中.app.coffee要求路由存在的模块,显然是路由,即

app.get '/', 'route_handler'

Run Code Online (Sandbox Code Playgroud)

node.js coffeescript express

Sid*_*rth

lucky-day

0
推荐指数

1
解决办法

723
查看次数

标签统计

python ×3

scrapy ×2

alembic ×1

cgaffinetransform ×1

coffeescript ×1

core-animation ×1

express ×1

ios ×1

mongodb ×1

mongoengine ×1

node.js ×1

objective-c ×1

sqlalchemy ×1

uiviewanimation ×1

virtualenv ×1

标签 统计

小编Sid_rth的帖子

标签统计