我在 bash 中有一个 perl 脚本,它使用 cURL 使用用户名/密码和 cookie 通过 HTTPS 下载文件。然而,我发现 cookie 是动态的,因此硬编码它不起作用。所以想请教一下大家对于这个问题有没有更好的解决办法。
下面是我当前的代码。
# This just pulls the webpage. Will need to parse out the HTML Table chunk into a CSV
print "Downloading $target_file.html\n";
if ($filespec eq "file1") {
my $curl_code = `curl "https://sample.com/wells/CompositionReportResult.asp?ReportId=123" -H 'Cookie: SMDEVICE=eyJhbGciOiJBMTI4S1ciLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0.0lMlSBb0mzYbuV6M0P9Qst0JijUl2YvuVxaBtAsVEfcUkKLRsO2p1w.PC
Y7zSD0HHuUCZZSsEbnFg.Z5QUEHlGNDQDoL8UsIKnakAxalwcGU7CX5rUerKKbrem25tc8GyIcmxPasp4KzFF9caR20lH7_lc6M1QlYssBEYEcI4DM50P-t8Mtr-PhDA.-Vdgrpt4t96Lk_N1hPCMYg; SMUSRMSG=; Navajo=+eGoE8a0fQmcgzcWwCp5hHDE/d2qhD+3PWhAGpDtbjdcarcbVJ04ReK6NKYsEQp
6Rnew1GOvnH0-; SMSESSION=LtzURcRBZWhHbcJTfzHBMmf+TKgP6SLlM5r1l6HXDUB4xsOPw+G0lJleBwYDcLPotLXzu5+arzlceezQP2oc/cCmsi9K86aIzn9DkNlJhZUUPDX6SzvSxRxZDJDvsyAsDFv+Tn+0hixJ4f7fHk0ATYFMUjyy7nM3DnW9kxHUuwfrAxHGnj5zvBRKzMTV0jM6Eotgc8jUJS6ehEwZX
GEIVC7QIuk3dtJqaSEXzkI6m4Bxi7VUq522ajQ618rsC3ICS3gy/y/+RnpfAHEFkJHwWpCP4kz1byYdSjKmDaq8cxnIJWXENCxUj2OuKEkJw8izZfum47+5FfI2gAPnv1aq5W+902r/AwSOSJfT8YBCCWHARbf1hab2Jwk9XfFxmWUDNsVa6oxTLc3/LQ1fEO7rd9lVqlyPUD9q+U53I5IZ6UybyrNzRNrGY11tftc
Glo0FDlVpmpboebeJXnIk+JPDB9azB8/QC1cKoGU+3VvmikUsvMx2/I6ZNRbagJKMUo4zSEsOcaLMgV8WrLVDifg7brUvASDQSXsKgWisIoQfVhiVGWsKJl0ovQ9Eeih1iTQpt05FpDXiigsb0cFz/vcRCt6chszLT99kKoO29Ck0abro4Tir4ZTOXSAAyu108C0+f1sWCjvM/FX4Myer/pgSDQfXzy5w/7mdVLSlQd
SNBIAzjDZdrAk2vxsLeYR6UjJLJocPeC/YOBwSC438IVw0LSay1oBsLv0OlQukMvxKuPnvROPHvIecXNN3PrYenR4ctYtw+rLZCAVh0RGuKzkgjC1DMxk9g1mPmFGPmmUI9nRQ+NI2BD1CQt2Np1zI29Ow7bOqc6OhobsV2C5wpmKZwEqRjeNqaNJGuwYYJwId6vmKgq5wq0J60CcOXOCjN/EX5YwVbLERPAN/3F1h
n3h/c52odG2CTkGwq1B4ns/6Uc06XW3SSB+olODwgLmRqUQgrlajyPdYiub6JJMpA+hf4iGSiz3LV7s4efSPPIUCM/5DKy9kcoQWmjsbPVoBCpS4Aywnc0xAhRpbK6KeO6dzndVvoZC8D9aQj4MBMoTTJBZGrAIr2WkE71msMIvg2BmFRegsTqODLHNJqxZfBeNxVHcO+bk8RUbje3+QmCmY6Tc6Nl9jbfH3I5k0zB
g9SQ/x6HWNOKv7NuTZHQ9uiC6DslxLXX87eZV0L+4SF2QMtkwl82pHXGtcMkk2F9desk0dHB406LCRCki3+DYQTuQMfDH/HYIuQzvpuCrzIRpHZvIpoA7sR+tgEkUFVhibK/gVM' --compressed > "$httpsconf->{localdir}/$target_file.html"`;
} elsif ($filespec eq "file2") {
my $curl_code = `curl "https://sample.com/wells/MainReportResult.asp?ReportId=456" -H 'Cookie: SMDEVICE=eyJhbGciOiJBMTI4S1ciLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0.0lMlSBb0mzYbuV6M0P9Qst0JijUl2YvuVxaBtAsVEfcUkKLRsO2p1w.PCY7zSD0H
HuUCZZSsEbnFg.Z5QUEHlGNDQDoL8UsIKnakAxalwcGU7CX5rUerKKbrem25tc8GyIcmxPasp4KzFF9caR20lH7_lc6M1QlYssBEYEcI4DM50P-t8Mtr-PhDA.-Vdgrpt4t96Lk_N1hPCMYg; SMUSRMSG=; Navajo=+eGoE8a0fQmcgzcWwCp5hHDE/d2qhD+3PWhAGpDtbjdcarcbVJ04ReK6NKYsEQp6Rnew1G
OvnH0-; SMSESSION=2a3mfLKCU39E56N+0e9O8WDQTAQ3RZ9FCxsd0khmmKUMmfg1NOzRVAE/KJLvqSTmoowZkHZlh4BMCchhlv+Ej2ePzTdo1fzWLAIrGZwiPOobjqJnOZ8Xb+RluEXcZxsHoGVSLpcT51x3uw5Xd9E39XdupiYRoqkEQHRpC4KgsJts6lsU1WIXGfnskAxLIwt+bXSMLQVoccQkjnR0bQRI1RME …Run Code Online (Sandbox Code Playgroud) 我目前正在编写一个 bash 脚本来查找 File1 中可用但 File2 中不可用的名称。
文件1:“姓名”“杰夫”“迈克尔”“林戈”“约翰”
文件2:“姓名”“杰夫”“迈克尔”“约翰”“伯特”
从上面的示例来看,它应该返回“Ringo”。到目前为止,我正在运行一个 for 循环来提取它。
for q in `cat File1 | tail -n +2 | sort`;do grep $q File2 >> output.txt;done
Run Code Online (Sandbox Code Playgroud)
然而,在大约 150,000 条记录上运行它需要很长时间。那么,有没有更好的解决方案可以分享呢?
预先感谢您的回答。