使用 cURL 访问和下载带有动态 cookie 的文件

jgp*_*a04 1 linux perl curl

我在 bash 中有一个 perl 脚本,它使用 cURL 使用用户名/密码和 cookie 通过 HTTPS 下载文件。然而,我发现 cookie 是动态的,因此硬编码它不起作用。所以想请教一下大家对于这个问题有没有更好的解决办法。

下面是我当前的代码。

    # This just pulls the webpage. Will need to parse out the HTML Table chunk into a CSV
        print "Downloading $target_file.html\n";
                if  ($filespec eq "file1")  {
my $curl_code = `curl "https://sample.com/wells/CompositionReportResult.asp?ReportId=123"   -H 'Cookie: SMDEVICE=eyJhbGciOiJBMTI4S1ciLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0.0lMlSBb0mzYbuV6M0P9Qst0JijUl2YvuVxaBtAsVEfcUkKLRsO2p1w.PC
Y7zSD0HHuUCZZSsEbnFg.Z5QUEHlGNDQDoL8UsIKnakAxalwcGU7CX5rUerKKbrem25tc8GyIcmxPasp4KzFF9caR20lH7_lc6M1QlYssBEYEcI4DM50P-t8Mtr-PhDA.-Vdgrpt4t96Lk_N1hPCMYg; SMUSRMSG=; Navajo=+eGoE8a0fQmcgzcWwCp5hHDE/d2qhD+3PWhAGpDtbjdcarcbVJ04ReK6NKYsEQp
6Rnew1GOvnH0-; SMSESSION=LtzURcRBZWhHbcJTfzHBMmf+TKgP6SLlM5r1l6HXDUB4xsOPw+G0lJleBwYDcLPotLXzu5+arzlceezQP2oc/cCmsi9K86aIzn9DkNlJhZUUPDX6SzvSxRxZDJDvsyAsDFv+Tn+0hixJ4f7fHk0ATYFMUjyy7nM3DnW9kxHUuwfrAxHGnj5zvBRKzMTV0jM6Eotgc8jUJS6ehEwZX
GEIVC7QIuk3dtJqaSEXzkI6m4Bxi7VUq522ajQ618rsC3ICS3gy/y/+RnpfAHEFkJHwWpCP4kz1byYdSjKmDaq8cxnIJWXENCxUj2OuKEkJw8izZfum47+5FfI2gAPnv1aq5W+902r/AwSOSJfT8YBCCWHARbf1hab2Jwk9XfFxmWUDNsVa6oxTLc3/LQ1fEO7rd9lVqlyPUD9q+U53I5IZ6UybyrNzRNrGY11tftc
Glo0FDlVpmpboebeJXnIk+JPDB9azB8/QC1cKoGU+3VvmikUsvMx2/I6ZNRbagJKMUo4zSEsOcaLMgV8WrLVDifg7brUvASDQSXsKgWisIoQfVhiVGWsKJl0ovQ9Eeih1iTQpt05FpDXiigsb0cFz/vcRCt6chszLT99kKoO29Ck0abro4Tir4ZTOXSAAyu108C0+f1sWCjvM/FX4Myer/pgSDQfXzy5w/7mdVLSlQd
SNBIAzjDZdrAk2vxsLeYR6UjJLJocPeC/YOBwSC438IVw0LSay1oBsLv0OlQukMvxKuPnvROPHvIecXNN3PrYenR4ctYtw+rLZCAVh0RGuKzkgjC1DMxk9g1mPmFGPmmUI9nRQ+NI2BD1CQt2Np1zI29Ow7bOqc6OhobsV2C5wpmKZwEqRjeNqaNJGuwYYJwId6vmKgq5wq0J60CcOXOCjN/EX5YwVbLERPAN/3F1h
n3h/c52odG2CTkGwq1B4ns/6Uc06XW3SSB+olODwgLmRqUQgrlajyPdYiub6JJMpA+hf4iGSiz3LV7s4efSPPIUCM/5DKy9kcoQWmjsbPVoBCpS4Aywnc0xAhRpbK6KeO6dzndVvoZC8D9aQj4MBMoTTJBZGrAIr2WkE71msMIvg2BmFRegsTqODLHNJqxZfBeNxVHcO+bk8RUbje3+QmCmY6Tc6Nl9jbfH3I5k0zB
g9SQ/x6HWNOKv7NuTZHQ9uiC6DslxLXX87eZV0L+4SF2QMtkwl82pHXGtcMkk2F9desk0dHB406LCRCki3+DYQTuQMfDH/HYIuQzvpuCrzIRpHZvIpoA7sR+tgEkUFVhibK/gVM'  --compressed >  "$httpsconf->{localdir}/$target_file.html"`;
                } elsif ($filespec eq "file2") {
my $curl_code = `curl "https://sample.com/wells/MainReportResult.asp?ReportId=456"   -H 'Cookie: SMDEVICE=eyJhbGciOiJBMTI4S1ciLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0.0lMlSBb0mzYbuV6M0P9Qst0JijUl2YvuVxaBtAsVEfcUkKLRsO2p1w.PCY7zSD0H
HuUCZZSsEbnFg.Z5QUEHlGNDQDoL8UsIKnakAxalwcGU7CX5rUerKKbrem25tc8GyIcmxPasp4KzFF9caR20lH7_lc6M1QlYssBEYEcI4DM50P-t8Mtr-PhDA.-Vdgrpt4t96Lk_N1hPCMYg; SMUSRMSG=; Navajo=+eGoE8a0fQmcgzcWwCp5hHDE/d2qhD+3PWhAGpDtbjdcarcbVJ04ReK6NKYsEQp6Rnew1G
OvnH0-; SMSESSION=2a3mfLKCU39E56N+0e9O8WDQTAQ3RZ9FCxsd0khmmKUMmfg1NOzRVAE/KJLvqSTmoowZkHZlh4BMCchhlv+Ej2ePzTdo1fzWLAIrGZwiPOobjqJnOZ8Xb+RluEXcZxsHoGVSLpcT51x3uw5Xd9E39XdupiYRoqkEQHRpC4KgsJts6lsU1WIXGfnskAxLIwt+bXSMLQVoccQkjnR0bQRI1RME
apxW0DV+RZDLjbiZTqC7PmXhbLmFLfONZxD0B2TPeH9w8YtL8pBNardx35CQyZ/j3ICutB/Qt/kTirFAotJhYAKQNG2O9vdmoiny3quPX3d87AhM2bmsIOAUetw5imTkwaU3KEgY5upSlXKTqvpqjcgQubVGlXBwPOzJJcLvEQJ4i4IwU6neUv5SvjuzOGhs3buyMbr63P2s1/3pyUOnn955Qmte/joWWoxVq+85AkJ
hkgM3wrlj9DMwsLzbqwU1JCbU1lDwjkNZzQci5s3QMaJjq27PmudijeZQ5cT3ZlgBjCef40AlJuMOzo6BfnS98nGWxvPHO2KIwXYloYx6u9uDQGbm6PUVDSwQSk5Nk5aPs53fvUO47AhjudjCs/l+nxgOW1uxNL6iZSDyKwXYnzLkLCnK/qPPvUlnULbjIGNNKxgtcH+yIdwFM9pUfEfIhaTthR1wK7MLKzFsqRijy
EiQF4wBrTuSaFjt84O3RqnNai/3iMUrZ8ajS1AasWaiZI+tAHaJIAQMceyzdHEOh6jdI2z7Mah6Gbu+yKMMX3AlotYidfaNcGtbieazaDgGe7oCpncpw4+y3OnLQy3eEfu7swg/Ty7IXZV2c6gzcoXzxar7mcP1stdJfovfciz2+pCLuJnL3pcAP7atGgmoJdK7MtgbA3GNgw/sj7vcqUgNoyKyqoTbfRqqZwaAgsL
kSdwXH6UAfxqiF2xo34nxyrFdaMkzEYNZCiU+VTQRrpjx+Y+lACTIhRTeThdPfc1Gy0CUs3bEzQiznlffhoYKwq/RNv43ySxY4QUb3iYE07JgfpXj5FdzL5PuMr1Y48ZWXf84OE0UMJ5gi31Fp85c2ewDLjm3I7iQhVQO6BsC9vXpjERWFwmDmx1mZUkgLBLjl4u7sSYIJER2GphP8G3+bt4D+WC5lCzUShK/PNOSx
SCZcSy8oMS+6BNbDUO1qU6w5QERvi5hkklKiQXhLTR4JJzi22tcyyHIcxNTTwKX6MPHKi9cjvHpQjzyILhg4SPzaTrtCY+ZUuFtqfblpt3jYGPfehDFO3grLK0J8AzFm'  --compressed >  "$httpsconf->{localdir}/$target_file.html"`;
                } elsif ($filespec eq "file3") {
                my $curl_code = `curl "sample.com/wells/DisplayMDXResult.asp?ReportId=789"  -H 'Cookie: SMDEVICE=eyJhbGciOiJBMTI4S1ciLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0.i0N-N4zs25tS1YyxFsPgghtTvb-j1yoJlNcMwFJt0YlczWTtzJbLyQ.VaOp-3bOsKZsFZK3J-5Ttw._d967mUZH2Oly5YVAZvxwbeud4Y86gy3K5r8dDpupC3aLnhYd7ZTB1r-4wsuvVzDtUGATftexCVNPMItz0AxwjQ8zuHpfEbNXVlgCEYwWDw.thTbm9lUA_B7C62X4mIa6g; SMUSRMSG=; Navajo=HghuGrY7Sps24/1DNa6srjbey+aj+UXdfPIARKvhZrg3dVBKiKYosBr/onJVE2iz0DdUAuty48M-; SMSESSION=RnIiliYFFe4EwQNRMIsgzMY7Vjf21dA8tuvUkLRO1S6Uql21o15cEaoj0ChiZ1V1dSwhCUmLog4EbcIxRlMx8qoQLd05tCN5/dAaspIdVez4D6hV42Ih6drEyX188wn2ZjPIDHet8Rxl+XHHKv4EHK6zB5ECCS+S4HB+M/fihEjaKA8o9ny8KvhGa5msJn1KXkhSvTByUpT78MdCLBghhlZeHGxapQhAU42zansZui4vIX3iUFmI9WQw0i3vDm0APoanQRW580YtXLxR//pxvUKpOjrsTW3DxcQ0W7Iib8/7xX7Sa8CnQhMG+heaHJ8LooaKBLGgEI9e1/52Un67YrJR6pgpk5VqB4Kkf1EQdzGWDvQkiBY8z2EDtitSh0EdKTVBiE0h9Voom7AkV9jxtRawpUYr0ktYwQEWQGmItcg10TrNUshTio/p5tkS/5Pmuy0e+rmTZbtiEEMpVeu/Yg/AHpYzfEuUxJtXyXQX4WNEwdiEAKH+cCObmunZ9n5jw++GLZFrQghszYeoDR16vmUPtW8XRx5teXSQkstUTIq+chuyYdPpPZmeo6tYHj8bO9lAsxKD3boeax7NxMfO2ev4EAl8J8sHPku2tM7ZRHi05ciEqLASDQjXIK1QTDpnPBstt2gxNWDXp84ml2fAlakzQ4RePtvYtJedjxaf6I+CW7vXG2IoxxLTCJtWUcMwq1iIBFFjYQj2e+j1YVYQIQXuRbwDuDVpsmUVdwh0GeyJiY/BUQRG27y1xom4jzAXOk8fRIUUaj2uIkjiJOS2wzr2rKnckm5H5xl88xialsrkGnXGtN4FBVzUQpx/AuQO01OFdUsAD9EXYlSaxjPJSYh788VooaY/v7SurbKcXhqJSNt4VTnAyidpQGn5dHRuklp5hdFxqfS2Oxt0sAJdFgAVJ38a+qf9NbM7Gh3L+DC/ovgBGrSn8HRQkw3U/0yZZZUtVJLhQF+MO44qkiFF5n/fHJTUNzA+u4sB6rmQt0TMB9a3AV48OwafqkVubGvPasxmmdQp4kewVzbGoNHH4jjy65BBHsUXAlp79LD1vMS4Ig/VhMkhiqK+QJktyd8R5PX0+nOmTTAuVDEsNrWyYtJiIC338DvuMLYHjQsPet09XH54ufoF/21GIz+IA+6X90AXdXAurE3n7mZSFfmvJfs9taTg0jwRuRuQcojBY4c9g2oTp/cz4Q8GXnCrWCSwA7onrxOPelXOYUiLakBM8pxSJcUZhAb/mc0kZKD1WbNThYiriKIivIrv7ivzFrKre'  --compressed > "$httpsconf->{localdir}/$target_file.html"`;
                }
Run Code Online (Sandbox Code Playgroud)

zdi*_*dim 5

应该能够读取和写入相同的 cookie,例如

curl --cookie cookie_file --cookie-jar cookie_file https://...
Run Code Online (Sandbox Code Playgroud)

要不就

curl -b cookie_file -c cookie_file  https://...
Run Code Online (Sandbox Code Playgroud)

原则上,您可以使用-c选项在使用您的凭据登录时/之前写入 cookie 文件,然后使用-b它来读取下一个请求,但它也应该在一个请求中工作。

请参阅《curl 文档》中的 cookies以及《Everything curly 》一书中的Cookies 章节


或者,在 Perl 程序中,我们可以使用 Perl 对网络编程最广泛的支持。

一组已建立的工具围绕LWP::UserAgent以及许多其他类。请参阅LWP中的概述。另一个突出的例子是Mojo::UserAgent,位于整个 Web 框架的中间。我将LWP::UserAgent在下面使用一个例子。

默认情况下,cookie 支持是关闭的,因此我们首先需要通过其属性或使用方法来启用它。然后,用户代理对象将存储所有 cookie(在HTTP::Cookies对象中),根据每个请求的需要管理和发送它们。

use warnings;
use strict;
use feature 'say';

use LWP::UserAgent;

my $url = shift // die "Usage: $0 url\n";

my $ua = LWP::UserAgent->new;
$ua->cookie_jar({ file => "$ENV{HOME}/.cookies.txt" });
$ua->cookie_jar->save;

# How does the website expect a log in?

my $response = $ua->get( $url );

die $response->status_line if not $response->is_success;

say $ua->cookie_jar->as_string;
# ...
Run Code Online (Sandbox Code Playgroud)

我无法包含特定的登录代码,因为问题中没有说明需要如何完成。如果站点使用 HTTP 基本身份验证,那么credentials这是一个很好的方法,但如果它是基于 cookie 的,那么您需要请求该表单等(并且它可以使用两者)。

或者可以通过构造函数中的属性来设置 cookie

my $ua = LWP::UserAgent->new(
    cookie_jar => {
        file => "$ENV{HOME}/.cookies.txt",
        autosave => 1,
    }
);
Run Code Online (Sandbox Code Playgroud)

如果不需要将 cookie 保存在文件中来启用 cookie,我们可以简单地做

my $ua = LWP::UserAgent->new( cookie_jar => {} );
Run Code Online (Sandbox Code Playgroud)

并且不需要带有的行->cookie_jar(仍然可以使用它,例如 $ua->cookie_jar->as_string)。现在,“cookie_jar”保存在内存中,并在程序结束时被丢弃。

无论哪种方式,使用输入字符串运行程序https://www14.sample.comhttps://sample.com问题中的结果对我来说)都会打印

Set-Cookie3: vsid=917vr3900381853629235; path="/"; domain=www14.sample.com; path_spec; expires="2027-01-17 07:56:25Z"; HttpOnly; version=0
Run Code Online (Sandbox Code Playgroud)

(仅为演示而打印)

另一种方法是使用WWW::Mechanize,其中会处理 cookie,并提供更多好处。另一方面,有libcurl模块Net::Curl的 Perl 绑定