我正在编写一个scrapy脚本来从保罗克鲁格曼的纽约时报博客中提取最新的博客文章。该项目进展顺利,但是当我进入实际尝试提取数据的阶段时,我一直遇到同样的问题:
ERROR: Spider must return Request, BaseItem, dict or None, got 'generator' in <GET https://krugman.blogs.nytimes.com/more_posts_jsons/page/1/?homepage=1&apagenum=1>
Run Code Online (Sandbox Code Playgroud)
我正在使用的代码如下:
from scrapy import http
from scrapy.selector import Selector
from scrapy.spiders import CrawlSpider
import scrapy
from tutorial.items import BlogPost
class krugSpider(CrawlSpider):
name = 'krugbot'
start_url = ['https://krugman.blogs.nytimes.com']
def __init__(self):
self.url = 'https://krugman.blogs.nytimes.com/more_posts_jsons/page/{0}/?homepage=1&apagenum={0}'
def start_requests(self):
yield http.Request(self.url.format('1'), callback = self.parse_page)
def parse_page(self, response):
data = json.loads(response.body)
for block in range(len(data['posts'])):
yield self.parse_block(data['posts'][block])
page = data['args']['paged'] + 1
url = self.url.format(str(page))
yield http.Request(url, callback = self.parse_page)
def …Run Code Online (Sandbox Code Playgroud) 我正在使用with子句,最近我遇到了一个奇怪的问题.即使是简单的查询,我也会收到错误的语法错误,我无法弄清楚为什么会这样.
每当我运行代码时,只需:
WITH table1 AS (Select value1, value2 from table1)
Run Code Online (Sandbox Code Playgroud)
我得到'错误的语法附近')''错误.
我之前没有遇到麻烦,所以我觉得我犯了一个非常明显的愚蠢错误,我只是没有抓到.谁能指出我做错了什么?