Automate daily csv file download from website button click

use*_*ser 4 javascript csv automation phantomjs casperjs

I would like to automate the process of visiting a website, clicking a button, and saving the file. The only way to download the file on this site is to click a button. You can't navigate to the file using a url.

I have been trying to use phantomjs and casperjs to automate this process, but haven't had any success.

I recently tried to use brandon's solution here Grab the resource contents in CasperJS or PhantomJS

Here is my code for that

var fs = require('fs');
var cache = require('./cache');
var mimetype = require('./mimetype');
var casper = require('casper').create();

casper.start('http://www.example.com/page_with_download_button', function() {

});

casper.then(function() {    
     this.click('#download_button');
 });

 casper.on('resource.received', function (resource) {
     "use strict";
    for(i=0;i < resource.headers.length; i++){
        if(resource.headers[i]["name"] == "Content-Type" && resource.headers[i]["value"] == "text/csv; charset-UTF-8;"){
            cache.includeResource(resource);
        }
    }
 });

 casper.on('load.finished', function(status) {
    for(i=0; i< cache.cachedResources.length; i++){
        var file = cache.cachedResources[i].cacheFileNoPath;
        var ext = mimetype.ext[cache.cachedResources[index].mimetype];
        var finalFile = file.replace("."+cache.cacheExtension,"."+ext);
        fs.write('downloads/'+finalFile,cache.cachedResources[i].getContents(),'b');
    }
});

casper.run();
Run Code Online (Sandbox Code Playgroud)

I think the problem could be caused by my cachePath being incorrect in cache.js

exports.cachePath = 'C:/Users/username/AppData/Local/Ofi Labs/PhantomJS';
Run Code Online (Sandbox Code Playgroud)

Should I be using something in adition to the backslashes to define the path?

When I try

 casperjs --disk-cache=true export_script.js
Run Code Online (Sandbox Code Playgroud)

Nothing is downloaded. After a little debugging I have found that cache.cachedResources is always empty.

I would also be open to solutions outside of phantomjs/casperjs.


UPDATE

I am not longer trying to accomplish this with CasperJS/PhantomJS. I am using the chrome extension Tampermonkey suggested by dandavis. Tampermonkey was extremely easy to figure out. I installed Tampermonkey, navigated to the page with the download link, and then clicked New Script under tampermonkey and added my javascript code.

document.getElementById("download_button").click();
Run Code Online (Sandbox Code Playgroud)

Now every time I navigate to the page in my browser, the file is downloaded. I then created a batch script that looks like this

set date=%DATE:~10,4%_%DATE:~4,2%_%DATE:~7,2%
chrome "http://www.example.com/page-with-dl-button"
timeout 10
move "C:\Users\user\Downloads\export.csv" "C:\path\to\dir\export_%date%.csv"
Run Code Online (Sandbox Code Playgroud)

I set that batch script to run nightly using the windows task scheduler.

Success!

vic*_*chi 5

您的按钮很可能向服务器发出 POST 请求。为了跟踪它:

  1. Chrome 开发者工具中打开网络标签
  2. 导航到页面并点击按钮。
  3. 注意哪个请求导致文件下载。右键单击它并复制为 cURL
  4. 运行复制的 cURL

一旦您让 cURL 工作,您就可以根据您使用的操作系统使用 cron 或 Task Scheduler 安排下载。