我有一个大约需要 10-15 个小时才能完成的脚本。我想在 EC2 实例上运行它,然后说 10 小时后永远停止该过程。
方法--
CRONTAB- 如果我为这个过程做一个 cronjob 。如何确保它一生只运行一次并在 10 小时后删除?
这是我的scrapy代码。
import scrapy
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from urlparse import urljoin
import pymongo
import time
class CompItem(scrapy.Item):
text = scrapy.Field()
name = scrapy.Field()
date = scrapy.Field()
url = scrapy.Field()
rating = scrapy.Field()
title = scrapy.Field()
category = scrapy.Field()
source = scrapy.Field()
user_info = scrapy.Field()
email = scrapy.Field()
mobile_no = scrapy.Field()
url_1 = scrapy.Field()
model_name = scrapy.Field()
class criticspider(CrawlSpider):
name = "flipkart_reviews"
allowed_domains = ["flipkart.com"]
urls = []
connection = pymongo.MongoClient("mongodb://localhost")
db …Run Code Online (Sandbox Code Playgroud) 我一度知道你需要使用像 selenium 这样的网络工具包来自动化抓取。
我将如何能够点击谷歌游戏商店上的下一个按钮,以便仅出于我的大学目的抓取评论!
import scrapy
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from urlparse import urljoin
from selenium import webdriver
import time
class Product(scrapy.Item):
title = scrapy.Field()
class FooSpider(CrawlSpider):
name = 'foo'
start_urls = ["https://play.google.com/store/apps/details?id=com.gaana&hl=en"]
def __init__(self, *args, **kwargs):
super(FooSpider, self).__init__(*args, **kwargs)
self.download_delay = 0.25
self.browser = webdriver.Chrome(executable_path="C:\chrm\chromedriver.exe")
self.browser.implicitly_wait(60) #
def parse(self,response):
self.browser.get(response.url)
sites = response.xpath('//div[@class="single-review"]/div[@class="review-header"]')
items = []
for i in range(0,200):
time.sleep(20)
button = self.browser.find_element_by_xpath("/html/body/div[4]/div[6]/div[1]/div[2]/div[2]/div[1]/div[2]/button[1]/div[2]/div/div")
button.click()
self.browser.implicitly_wait(30)
for site in sites: …Run Code Online (Sandbox Code Playgroud)