我想通过点击"以csv格式下载文件"下载一个.zip文件,该文件位于URL http://www.nseindia.com/content/equities/cmbhav.htm.
如果右键单击"以CSV格式下载文件",然后选择复制链接位置,然后将URL模式会像 http://www.nseindia.com/content/historical/EQUITIES/2012/MAR/cm23MAR2012bhav.csv.zip.
我想写一个Perl脚本,它将从URL下载.zip文件.
以下代码无效
#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
my $url = 'http://www.nseindia.com/content/historical/EQUITIES/2012/MAR' ;
my $file = 'cm23MAR2012bhav.csv.zip' ;
getstore($url, $file) ;
Run Code Online (Sandbox Code Playgroud)
如果您需要更改用户代理并仍想使用LWP :: Simple,则可以使用$ua导出:
use File::Basename;
use LWP::Simple qw($ua getstore);
use URI;
my $url = URI->new( 'http://www.nseindia.com/content/historical/EQUITIES/2012/MAR/cm23MAR2012bhav.csv.zip' );
$ua->default_headers( HTTP::Headers->new(
Accept => '*/*',
)
);
$ua->agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.54.16 (KHTML, like Gecko) Version/5.1.4 Safari/534.54.16");
my $rc = getstore( $url, basename( $url->path ) );
say "Result is $rc";
Run Code Online (Sandbox Code Playgroud)
事实证明,用户代理字符串和Accept标头的组合将会这样做.通常,这些问题归结为使您的LWP请求看起来就像浏览器发送的请求一样.我使用HTTPScoop来观察浏览器事务,但是有很多程序会为你做同样的事情.
如果情况变得复杂,我赞成Mojo :: UserAgent.使用该交易更容易:
use File::Basename;
use Mojo::UserAgent;
use URI;
my $url = URI->new( 'http://www.nseindia.com/content/historical/EQUITIES/2012/MAR/cm23MAR2012bhav.csv.zip' );
my $file = basename( $url->path );
printf "URL: %s\nFile: %s\n", $url, $file;
my $response = Mojo::UserAgent->new->name(
'"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.54.16 (KHTML, like Gecko) Version/5.1.4 Safari/534.54.16"'
)->get( $url->as_string, { Accept => '*/*' } )->res;
open my $fh, '>', $file or die "Could not open [$file]: $!";
print $fh $response->body;
printf "Status: %d\n", $response->code;
Run Code Online (Sandbox Code Playgroud)