小编pri*_*ari的帖子

无法使用scrapy从Reddit嵌入式提要窗口中获取“ href”

我正在尝试通过以下链接从reddit feed窗口中获取reddit帐户名称：

fetch('https://coinmarketcap.com/currencies/ripple/')

Run Code Online (Sandbox Code Playgroud)

现在，在这里，我可以使用以下代码成功获取Twitter帐户详细信息：

#fetch the tweet account of coin
tweet_account = response.xpath('//a[starts-with(@href, "https://twitter.com")]/@href').extract()
tweet_account = [s for s in tweet_account if s != 'https://twitter.com/CoinMarketCap']
tweet_account = [s for s in tweet_account if len(s) < 60 ]
print(tweet_account)

Run Code Online (Sandbox Code Playgroud)

但是，我无法使用类似的方法来获得reddit帐户？

reddit_account = response.xpath('//a[starts-with(@href, "https://www.reddit.com")]/@href').extract()
reddit_account = [s for s in reddit_account if s != 'https://www.reddit.com/r/CoinMarketCap'']
reddit_account = [s for s in reddit_account if len(s) < 60 ]
print(reddit_account)

Run Code Online (Sandbox Code Playgroud)

甚至我都尝试过使用简单的xpath直接获取，但是它不起作用：

response.xpath('//*[@id="reddit"]/div/div[1]/h4/a[2]/@href')

Run Code Online (Sandbox Code Playgroud)

输出为：

response.xpath('//*[@id="reddit"]').extract()

Run Code Online (Sandbox Code Playgroud)

表演

<b>['<div id="reddit" class="col-sm-6 text-left">\n</div>']</b>

Run Code Online (Sandbox Code Playgroud)

但是这个div标签中还有更多标签吗？为什么我无法获得那些标签？

不幸的是，Scrapy无法找到此div内部的内容。此Reddit Feed甚至没有iframe。我应该打电话给其他URL吗？

编辑<\ …

python scrapy

pri*_*ari

2019 03-25

5
推荐指数

1
解决办法

102
查看次数

标签统计

python ×1

scrapy ×1

无法使用scrapy从Reddit嵌入式提要窗口中获取“ href”

标签 统计

小编pri_ari的帖子

标签统计