小编stu*_*ent的帖子

如何用scrapy刮掉每个链接的所有内容？

我是scrapy的新手我想从这个网站上提取每个广告的所有内容.所以我尝试了以下方法:

from scrapy.spiders import Spider
from craigslist_sample.items import CraigslistSampleItem

from scrapy.selector import Selector
class MySpider(Spider):
    name = "craig"
    allowed_domains = ["craigslist.org"]
    start_urls = ["http://sfbay.craigslist.org/search/npo"]

    def parse(self, response):
        links = response.selector.xpath(".//*[@id='sortable-results']//ul//li//p")
        for link in links:
            content = link.xpath(".//*[@id='titletextonly']").extract()
            title = link.xpath("a/@href").extract()
            print(title,content)

Run Code Online (Sandbox Code Playgroud)

项目:

# Define here the models for your scraped items

from scrapy.item import Item, Field

class CraigslistSampleItem(Item):
    title = Field()
    link = Field()

Run Code Online (Sandbox Code Playgroud)

但是,当我运行爬虫时,我什么都没得到:

$ scrapy crawl --nolog craig
[]
[]
[]
[]
[]
[]
[]
[] …

Run Code Online (Sandbox Code Playgroud)

python web-crawler scrapy web-scraping scrapy-spider

stu*_*ent

2016 12-23

9
推荐指数

2
解决办法

5304
查看次数

如何删除 pandas 数据框中带有 NaN 的行？

我有这个 pandas 数据框，它实际上是一个 Excel 电子表格：

    Unnamed: 0  Date    Num     Company     Link    ID
0   NaN     1990-11-15  131231  apple...    http://www.example.com/201611141492/xellia...   290834
1   NaN     1990-10-22  1231    microsoft http://www.example.com/news/arnsno...     NaN
2   NaN     2011-10-20  123     apple   http://www.example.com/ator...  209384
3   NaN     2013-10-27  123     apple...    http://example.com/sections/th-shots/2016/...   098
4   NaN     1990-10-26  123     google  http://www.example.net/business/Drugmak...  098098
5   NaN     1990-10-18  1231    google...   http://example.com/news/va-rece...  NaN
6   NaN     2011-04-26  546     amazon...   http://www.example.com/news/home/20160425...    9809

Run Code Online (Sandbox Code Playgroud)

我想删除列NaN中的所有行ID并重新索引“索引假想列”：

    Unnamed: 0  Date    Num     Company     Link    ID
0   NaN     1990-11-15  131231  apple...    http://www.example.com/201611141492/xellia...   290834 …

Run Code Online (Sandbox Code Playgroud)

python python-3.x pandas

stu*_*ent

2021 10-09

4
推荐指数

1
解决办法

9608
查看次数

如何用玩具名称重命名长熊猫数据框？

我有一个包含120列的pandas数据帧.列看起来像这样:

0_x     1_x     2_x     3_x     4_x     5_x     6_x     7_x     8_x     0_y     ...     65 ... 120

Run Code Online (Sandbox Code Playgroud)

如何在一次运动中重命名它们？我阅读了文档,发现在pandas中重命名列的方法是:

df.columns = ['col1', 'col2', 'col3']

Run Code Online (Sandbox Code Playgroud)

问题是我写的120多列的列表可能很奇怪.这个问题存在哪些替代方案？假设我想将所有列命名为:col1to colN.

python python-3.x pandas

stu*_*ent

lucky-day

1
推荐指数

1
解决办法

110
查看次数

标签统计

python ×3

pandas ×2

python-3.x ×2

scrapy ×1

scrapy-spider ×1

web-crawler ×1

web-scraping ×1

如何用scrapy刮掉每个链接的所有内容？

如何删除 pandas 数据框中带有 NaN 的行？

如何用玩具名称重命名长熊猫数据框？

标签 统计

小编stu_ent的帖子

标签统计