如何使用Node.js下载文件(不使用第三方库)?

gre*_*pow 390 javascript download fs node.js express

如何在不使用第三方库的情况下下载带有Node.js的文件?

我不需要任何特别的东西.我只想从给定的URL下载文件,然后将其保存到给定的目录中.

Mic*_*ley 527

您可以创建HTTP GET请求并将其response传递到可写文件流中:

const http = require('http');
const fs = require('fs');

const file = fs.createWriteStream("file.jpg");
const request = http.get("http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg", function(response) {
  response.pipe(file);
});
Run Code Online (Sandbox Code Playgroud)

如果您想支持在命令行上收集信息 - 比如指定目标文件或目录或URL - 请查看Commander之类的内容.

  • http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg - 猫照片+1 (128认同)
  • 当脚本结束或丢失数据时,此代码是否正确关闭文件? (6认同)
  • @EthanKeiley 你为什么说它没有正确关闭?默认情况下,“createWriteStream”会将“autoClose”设置为“true”,并且“readed.pipe”将在可读结束时对可写调用“end()”。 (4认同)
  • 当我运行这个脚本时,我得到了以下控制台输出:`node.js:201 throw e; // process.nextTick错误,或第一次勾选时出现'error'事件^错误:在Object.afterConnect [as onmplete](net.js:637:18)`上的errnoException(net.js:646:11)处连接ECONNREFUSED. (3认同)
  • 这取决于req url类型,如果你请求`https`你必须使用`https`否则它会抛出错误. (3认同)
  • @quantumpotato查看您从请求中获得的回复 (2认同)
  • 此代码未正确关闭 fs 写入。在非常小的文件上它并不明显,但在较大的下载上它绝对是明显的。 (2认同)

Vin*_*uan 490

不要忘记处理错误!以下代码基于Augusto Roman的回答.

var http = require('http');
var fs = require('fs');

var download = function(url, dest, cb) {
  var file = fs.createWriteStream(dest);
  var request = http.get(url, function(response) {
    response.pipe(file);
    file.on('finish', function() {
      file.close(cb);  // close() is async, call cb after close completes.
    });
  }).on('error', function(err) { // Handle errors
    fs.unlink(dest); // Delete the file async. (But we don't check the result)
    if (cb) cb(err.message);
  });
};
Run Code Online (Sandbox Code Playgroud)

  • @TheGrayFox没有人喜欢错误处理:) (25认同)
  • 有没有办法看到下载的速度?喜欢可以跟踪多少mb/s?谢谢! (5认同)
  • @ vince-yuan是`download()`本身`pipe`able? (2认同)
  • @VinceYuan 回调让我很困惑。如果我现在调用`download()`,我会怎么做?我会把什么作为 `cb` 参数?我有 `download('someURI', '/some/destination', cb)` 但不明白在 cb 中放什么 (2认同)
  • @Abdul听起来你对node.js/javascript很新.看看这个教程:http://www.tutorialspoint.com/nodejs/nodejs_callbacks_concept.htm这并不复杂. (2认同)
  • @Abdul 如果你与班上的其他人分享你的想法,也许会很好? (2认同)

gfx*_*onk 130

正如Brandon Tilley所说,但具有适当的控制流程:

var http = require('http');
var fs = require('fs');

var download = function(url, dest, cb) {
  var file = fs.createWriteStream(dest);
  var request = http.get(url, function(response) {
    response.pipe(file);
    file.on('finish', function() {
      file.close(cb);
    });
  });
}
Run Code Online (Sandbox Code Playgroud)

无需等待finish事件,天真的脚本可能会以不完整的文件结束.

编辑:感谢@Augusto Roman指出cb应该传递给file.close,而不是显式调用.

  • @Abdul 仅在成功获取文件后需要执行某些操作时,才使用函数指定回调。 (3认同)
  • 回调使我感到困惑。如果我现在调用`download()`,我该怎么做?我将把什么作为`cb`参数?我有`download('someURI','/ some / destination',cb)`,但是不知道在cb中放什么 (2认同)

Buz*_*zut 62

说到处理错误,它甚至可以更好地监听请求错误.我甚至通过检查响应代码来验证.这里仅认为200响应代码成功,但其他代码可能是好的.

const fs = require('fs');
const http = require('http');

const download = (url, dest, cb) => {
    const file = fs.createWriteStream(dest);

    const request = http.get(url, (response) => {
        // check if response is success
        if (response.statusCode !== 200) {
            return cb('Response status was ' + response.statusCode);
        }

        response.pipe(file);
    });

    // close() is async, call cb after close completes
    file.on('finish', () => file.close(cb));

    // check for request error too
    request.on('error', (err) => {
        fs.unlink(dest);
        return cb(err.message);
    });

    file.on('error', (err) => { // Handle errors
        fs.unlink(dest); // Delete the file async. (But we don't check the result) 
        return cb(err.message);
    });
};
Run Code Online (Sandbox Code Playgroud)

尽管此代码相对简单,但我建议使用请求模块,因为它处理更多本身不支持的协议(hello HTTPS!)http.

这将是这样做的:

const fs = require('fs');
const request = require('request');

const download = (url, dest, cb) => {
    const file = fs.createWriteStream(dest);
    const sendReq = request.get(url);

    // verify response code
    sendReq.on('response', (response) => {
        if (response.statusCode !== 200) {
            return cb('Response status was ' + response.statusCode);
        }

        sendReq.pipe(file);
    });

    // close() is async, call cb after close completes
    file.on('finish', () => file.close(cb));

    // check for request errors
    sendReq.on('error', (err) => {
        fs.unlink(dest);
        return cb(err.message);
    });

    file.on('error', (err) => { // Handle errors
        fs.unlink(dest); // Delete the file async. (But we don't check the result)
        return cb(err.message);
    });
};
Run Code Online (Sandbox Code Playgroud)

  • 请求模块直接用于HTTP.凉! (2认同)
  • @Alex,nope,这是一条错误信息并且有回复.因此,如果`response.statusCode!== 200`,`finish`上的cb永远不会被调用. (2认同)

小智 46

gfxmonk的答案在回调和file.close()完成之间的数据竞争非常紧张. file.close()实际上接受一个在完成关闭时调用的回调.否则,立即使用该文件可能会失败(很少!).

完整的解决方案是:

var http = require('http');
var fs = require('fs');

var download = function(url, dest, cb) {
  var file = fs.createWriteStream(dest);
  var request = http.get(url, function(response) {
    response.pipe(file);
    file.on('finish', function() {
      file.close(cb);  // close() is async, call cb after close completes.
    });
  });
}
Run Code Online (Sandbox Code Playgroud)

无需等待完成事件,天真的脚本可能会以不完整的文件结束.如果不cb通过close 调度回调,您可能会在访问文件和实际准备好的文件之间进行竞争.

  • 您将请求存储到变量中是什么? (2认同)

Bja*_*ted 15

也许node.js发生了变化,但似乎其他解决方案存在一些问题(使用节点v8.1.2):

  1. 您无需file.close()finish活动中致电.默认情况下,fs.createWriteStream它设置为autoClose:https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
  2. file.close()应该在出错时调用.删除文件时可能不需要这样做(unlink()),但通常是:https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options
  3. 临时文件未删除 statusCode !== 200
  4. fs.unlink() 不建议使用回调(输出警告)
  5. 如果dest文件存在; 它被覆盖了

以下是处理这些问题的修改后的解决方案(使用ES6和承诺).

const http = require("http");
const fs = require("fs");

function download(url, dest) {
    return new Promise((resolve, reject) => {
        const file = fs.createWriteStream(dest, { flags: "wx" });

        const request = http.get(url, response => {
            if (response.statusCode === 200) {
                response.pipe(file);
            } else {
                file.close();
                fs.unlink(dest, () => {}); // Delete temp file
                reject(`Server responded with ${response.statusCode}: ${response.statusMessage}`);
            }
        });

        request.on("error", err => {
            file.close();
            fs.unlink(dest, () => {}); // Delete temp file
            reject(err.message);
        });

        file.on("finish", () => {
            resolve();
        });

        file.on("error", err => {
            file.close();

            if (err.code === "EEXIST") {
                reject("File already exists");
            } else {
                fs.unlink(dest, () => {}); // Delete temp file
                reject(err.message);
            }
        });
    });
}
Run Code Online (Sandbox Code Playgroud)

  • 对此有两点评论:1) 它可能应该拒​​绝 Error 对象,而不是字符串,2) fs.unlink 会悄悄地吞下错误,这可能不一定是您想要做的 (2认同)

A-3*_*312 14

超时解决方案,防止内存泄漏:

以下代码基于Brandon Tilley的答案:

var http = require('http'),
    fs = require('fs');

var request = http.get("http://example12345.com/yourfile.html", function(response) {
    if (response.statusCode === 200) {
        var file = fs.createWriteStream("copy.html");
        response.pipe(file);
    }
    // Add timeout.
    request.setTimeout(12000, function () {
        request.abort();
    });
});
Run Code Online (Sandbox Code Playgroud)

当你收到错误时不要创建文件,并且在X secondes之后优先使用超时来关闭你的请求.


mid*_*ido 13

对于那些寻找基于es6风格承诺的方式,我想它会是这样的:

var http = require('http');
var fs = require('fs');

function pDownload(url, dest){
  var file = fs.createWriteStream(dest);
  return new Promise((resolve, reject) => {
    var responseSent = false; // flag to make sure that response is sent only once.
    http.get(url, response => {
      response.pipe(file);
      file.on('finish', () =>{
        file.close(() => {
          if(responseSent)  return;
          responseSent = true;
          resolve();
        });
      });
    }).on('error', err => {
        if(responseSent)  return;
        responseSent = true;
        reject(err);
    });
  });
}

//example
pDownload(url, fileLocation)
  .then( ()=> console.log('downloaded file no issues...'))
  .catch( e => console.error('error while downloading', e));
Run Code Online (Sandbox Code Playgroud)

  • `responseSet` 标志导致,由于某种我没有时间调查的原因,我的文件下载不完整。没有出现错误,但我正在填充的 .txt 文件有一半的行需要在那里。删除标志的逻辑修复了它。只是想指出是否有人对这种方法有问题。仍然,+1 (2认同)

小智 8

嗨?我想你可以使用child_process模块和 curl 命令。

const cp = require('child_process');

let download = async function(uri, filename){
    let command = `curl -o ${filename}  '${uri}'`;
    let result = cp.execSync(command);
};


async function test() {
    await download('http://zhangwenning.top/20181221001417.png', './20181221001417.png')
}

test()
Run Code Online (Sandbox Code Playgroud)

另外,当你想下载大的?多个文件时?你可以使用cluster模块来使用更多的cpu核。


mix*_*dev 7

我更喜欢 request() 因为它可以同时使用 http 和 https。

request('http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg')
  .pipe(fs.createWriteStream('cat.jpg'))
Run Code Online (Sandbox Code Playgroud)


Jos*_*eak 7

基于上面的其他答案和一些微妙的问题,这是我的尝试。

  1. 使用fs.access.在访问网络之前检查文件不存在。
  2. fs.createWriteStream当您获得200 OK状态代码时才创建。这减少了fs.unlink整理临时文件句柄所需的命令数量。
  3. 即使在一个200 OK我们仍然可能reject由于EEXIST文件已经存在。
  4. download如果您在标题中提供的链接位置之后获得301 Moved Permanently302 Found (Moved Temporarily)重定向,则递归调用。
  5. 与一些其他的答案递归调用的问题download是,他们所谓resolve(download)的替代download(...).then(() => resolve()),因此Promise实际上完成下载之前返回。这样,嵌套的承诺链以正确的顺序解析。
  6. 异步清理临时文件似乎很酷,但我也选择在完成后拒绝,所以我知道当这个承诺解决或拒绝时,一切都已经完成。
const https = require('https');
const fs = require('fs');

/**
 * Download a resource from `url` to `dest`.
 * @param {string} url - Valid URL to attempt download of resource
 * @param {string} dest - Valid path to save the file.
 * @returns {Promise<void>} - Returns asynchronously when successfully completed download
 */
function download(url, dest) {
  return new Promise((resolve, reject) => {
    // Check file does not exist yet before hitting network
    fs.access(dest, fs.constants.F_OK, (err) => {

        if (err === null) reject('File already exists');

        const request = https.get(url, response => {
            if (response.statusCode === 200) {
       
              const file = fs.createWriteStream(dest, { flags: 'wx' });
              file.on('finish', () => resolve());
              file.on('error', err => {
                file.close();
                if (err.code === 'EEXIST') reject('File already exists');
                else fs.unlink(dest, () => reject(err.message)); // Delete temp file
              });
              response.pipe(file);
            } else if (response.statusCode === 302 || response.statusCode === 301) {
              //Recursively follow redirects, only a 200 will resolve.
              download(response.headers.location, dest).then(() => resolve());
            } else {
              reject(`Server responded with ${response.statusCode}: ${response.statusMessage}`);
            }
          });
      
          request.on('error', err => {
            reject(err.message);
          });
    });
  });
}
Run Code Online (Sandbox Code Playgroud)


Rik*_*Rik 7

使用http2模块

我看到使用httphttpsrequest模块的答案。我想使用另一个支持 http 或 https 协议的本机 NodeJS 模块添加一个:

解决方案

我已经参考了官方 NodeJS API,以及关于这个问题的一些其他答案,以了解我正在做的事情。以下是我为了尝试而编写的测试,它按预期工作:

import * as fs from 'fs';
import * as _path from 'path';
import * as http2 from 'http2';

/* ... */

async function download( host, query, destination )
{
    return new Promise
    (
        ( resolve, reject ) =>
        {
            // Connect to client:
            const client = http2.connect( host );
            client.on( 'error', error => reject( error ) );

            // Prepare a write stream:
            const fullPath = _path.join( fs.realPathSync( '.' ), destination );
            const file = fs.createWriteStream( fullPath, { flags: "wx" } );
            file.on( 'error', error => reject( error ) );

            // Create a request:
            const request = client.request( { [':path']: query } );

            // On initial response handle non-success (!== 200) status error:
            request.on
            (
                'response',
                ( headers/*, flags*/ ) =>
                {
                    if( headers[':status'] !== 200 )
                    {
                        file.close();
                        fs.unlink( fullPath, () => {} );
                        reject( new Error( `Server responded with ${headers[':status']}` ) );
                    }
                }
            );

            // Set encoding for the payload:
            request.setEncoding( 'utf8' );

            // Write the payload to file:
            request.on( 'data', chunk => file.write( chunk ) );

            // Handle ending the request
            request.on
            (
                'end',
                () =>
                {
                    file.close();
                    client.close();
                    resolve( { result: true } );
                }
            );

            /* 
                You can use request.setTimeout( 12000, () => {} ) for aborting
                after period of inactivity
            */

            // Fire off [flush] the request:
            request.end();
        }
    );
}
Run Code Online (Sandbox Code Playgroud)

那么,例如:

/* ... */

let downloaded = await download( 'https://gitlab.com', '/api/v4/...', 'tmp/tmpFile' );

if( downloaded.result )
{
    // Success!
}

// ...
Run Code Online (Sandbox Code Playgroud)

外部参考

编辑信息

  • 该解决方案是为 typescript 编写的,该函数是一个类方法- 但如果没有注意到这一点,如果没有正确使用该声明,该解决方案将无法为假定的 javascript 用户工作function,我们的贡献者已经如此迅速地添加了该声明。谢谢!


Mik*_*ans 7

2022 年末编辑

Node v18 及更高版本提供了Node 本身内置的原生 Fetch API 支持。不需要第三方库或小型手工制作的垫片,只需使用fetch您习惯的浏览器方式即可。

(即下面的第二个代码块不再需要该行const fetch = require(`./that-code-shown-above.js`);fetch已经全局存在)

原答案:

对于Promise支持 Node 的 Node,用于(部分)Fetch API 的简单 Node shim仅需要少量额外代码,而不需要安装任何特殊模块:

const http = require(`http`);
const https = require(`https`);

module.exports = function fetch(url) {
  // we're returning a promise, so this function can also be `await`ed
  return new Promise((resolve, reject) => {
    const data = [];
    // make sure we use the correct protocol handler
    const client = url.startsWith("https") ? https : http;
    client
      .request(url, (conn) => {
        // aggregate the response stream into a single string.
        conn.on(`data`, (chunk) => data.push(chunk));
        conn.on(`end`, () => {
          // make sure to encode that string using utf8
          const asBytes = Buffer.concat(data);
          const asString = asBytes.toString(`utf8`);
          // and then trigger the resolution, with the
          // most frequently used fetch API "follow-up"
          // functions:
          resolve({
            arrayBuffer: async () => asBytes,
            json: async () => JSON.parse(asString),
            text: async () => asString,
          });
        });
        conn.on(`error`, (e) => reject(e));
      })
      .end();
  });
};
Run Code Online (Sandbox Code Playgroud)

然后,您可以使用您在浏览器中习惯的正常获取语法,将其用于您需要的任何内容:

const fs = require(`fs`);
const fetch = require(`./that-code-shown-above.js`);

fetch(`https://placekitten.com/200/300`)
  .then(res => res.arrayBuffer())
  .then(bytes => fs.writeFileSync(`kitten.jpg`, bytes))
  .catch(e => console.error(e));

try {
  const response = await fetch(`https://jsonplaceholder.typicode.com/todos/1`);
  const data = await response.json();
  console.log(data);
} catch (e) {
  console.error(e);
}

// etc.
Run Code Online (Sandbox Code Playgroud)


Fee*_*ics 6

文斯·袁(Vince Yuan)的代码很棒,但似乎有问题。

function download(url, dest, callback) {
    var file = fs.createWriteStream(dest);
    var request = http.get(url, function (response) {
        response.pipe(file);
        file.on('finish', function () {
            file.close(callback); // close() is async, call callback after close completes.
        });
        file.on('error', function (err) {
            fs.unlink(dest); // Delete the file async. (But we don't check the result)
            if (callback)
                callback(err.message);
        });
    });
}
Run Code Online (Sandbox Code Playgroud)

  • 我们可以指定目标文件夹吗? (2认同)

Ida*_*gan 6

?因此,如果您使用pipeline,它将关闭所有其他流并确保没有内存泄漏。

工作示例:

const http = require('http');
const { pipeline } = require('stream');
const fs = require('fs');

const file = fs.createWriteStream('./file.jpg');

http.get('http://via.placeholder.com/150/92c952', response => {
  pipeline(
    response,
    file,
    err => {
      if (err)
        console.error('Pipeline failed.', err);
      else
        console.log('Pipeline succeeded.');
    }
  );
});
Run Code Online (Sandbox Code Playgroud)

“流上的 .pipe 和 .pipeline 之间有什么区别”的回答


wil*_*msi 6

download.js(即/project/utils/download.js)

const fs = require('fs');
const request = require('request');

const download = (uri, filename, callback) => {
    request.head(uri, (err, res, body) => {
        console.log('content-type:', res.headers['content-type']);
        console.log('content-length:', res.headers['content-length']);

        request(uri).pipe(fs.createWriteStream(filename)).on('close', callback);
    });
};

module.exports = { download };
Run Code Online (Sandbox Code Playgroud)


应用程序.js

... 
// part of imports
const { download } = require('./utils/download');

...
// add this function wherever
download('https://imageurl.com', 'imagename.jpg', () => {
  console.log('done')
});
Run Code Online (Sandbox Code Playgroud)


Rom*_*nov 6

现代版本(ES6、Promise、Node 12.x+)适用于 https/http。它还支持重定向 302 和 301。我决定不使用 3rd 方库,因为使用标准 Node.js 库可以轻松完成。

// download.js
import fs from 'fs'
import https from 'https'
import http from 'http'
import { basename } from 'path'
import { URL } from 'url'

const TIMEOUT = 10000

function download (url, dest) {
  const uri = new URL(url)
  if (!dest) {
    dest = basename(uri.pathname)
  }
  const pkg = url.toLowerCase().startsWith('https:') ? https : http

  return new Promise((resolve, reject) => {
    const request = pkg.get(uri.href).on('response', (res) => {
      if (res.statusCode === 200) {
        const file = fs.createWriteStream(dest, { flags: 'wx' })
        res
          .on('end', () => {
            file.end()
            // console.log(`${uri.pathname} downloaded to: ${path}`)
            resolve()
          })
          .on('error', (err) => {
            file.destroy()
            fs.unlink(dest, () => reject(err))
          }).pipe(file)
      } else if (res.statusCode === 302 || res.statusCode === 301) {
        // Recursively follow redirects, only a 200 will resolve.
        download(res.headers.location, dest).then(() => resolve())
      } else {
        reject(new Error(`Download request failed, response status: ${res.statusCode} ${res.statusMessage}`))
      }
    })
    request.setTimeout(TIMEOUT, function () {
      request.abort()
      reject(new Error(`Request timeout after ${TIMEOUT / 1000.0}s`))
    })
  })
}

export default download
Run Code Online (Sandbox Code Playgroud)

工藤到安德烈特卡琴科为他的依据,我修改

将其包含在另一个文件中并使用

const download = require('./download.js')
const url = 'https://raw.githubusercontent.com/replace-this-with-your-remote-file'
console.log('Downloading ' + url)

async function run() {
  console.log('Downloading file')
  try {
    await download(url, 'server')
    console.log('Download done')
  } catch (e) {
    console.log('Download failed')
    console.log(e.message)
  }
}

run()
Run Code Online (Sandbox Code Playgroud)

  • 惊人的。非常干净,谢谢。创建 writeStream 时,标志“wx”有什么作用? (2认同)

小智 5

您可以使用https://github.com/douzi8/ajax-request#download

request.download('http://res.m.ctrip.com/html5/Content/images/57.png', 
  function(err, res, body) {}
);
Run Code Online (Sandbox Code Playgroud)

  • 你认为 `ajax-request` 不是第三方库吗? (5认同)
  • 如果文件名不是 ascii,就像文件名是日文一样,它会返回垃圾字符。 (2认同)

wda*_*xna 5

使用 promise 下载,它解析可读流。放置额外的逻辑来处理重定向。

var http = require('http');
var promise = require('bluebird');
var url = require('url');
var fs = require('fs');
var assert = require('assert');

function download(option) {
    assert(option);
    if (typeof option == 'string') {
        option = url.parse(option);
    }

    return new promise(function(resolve, reject) {
        var req = http.request(option, function(res) {
            if (res.statusCode == 200) {
                resolve(res);
            } else {
                if (res.statusCode === 301 && res.headers.location) {
                    resolve(download(res.headers.location));
                } else {
                    reject(res.statusCode);
                }
            }
        })
        .on('error', function(e) {
            reject(e);
        })
        .end();
    });
}

download('http://localhost:8080/redirect')
.then(function(stream) {
    try {

        var writeStream = fs.createWriteStream('holyhigh.jpg');
        stream.pipe(writeStream);

    } catch(e) {
        console.error(e);
    }
});
Run Code Online (Sandbox Code Playgroud)

  • 302 也是 URL 重定向的 HTTP 状态码,所以你应该在 if 语句中使用这个 [301,302].indexOf(res.statusCode) !== -1 (2认同)

kay*_*yz1 5

const download = (url, path) => new Promise((resolve, reject) => {
http.get(url, response => {
    const statusCode = response.statusCode;

    if (statusCode !== 200) {
        return reject('Download error!');
    }

    const writeStream = fs.createWriteStream(path);
    response.pipe(writeStream);

    writeStream.on('error', () => reject('Error writing to file!'));
    writeStream.on('finish', () => writeStream.close(resolve));
});}).catch(err => console.error(err));
Run Code Online (Sandbox Code Playgroud)