Lar*_*a19 5 python amazon-web-services web-scraping
当我在 Jupyter 和虚拟机上运行这段代码时,它运行得很顺利。但是当我开始在AWS上运行时,它总是显示list index out of range。我想知道如何解决这个问题。谢谢!
代码:
from datetime import datetime, timedelta
from time import strptime
import requests
from lxml import html
import re
import time
import os
import sys
from pandas import DataFrame
import numpy as np
import pandas as pd
import sqlalchemy as sa
from sqlalchemy import create_engine
from sqlalchemy.sql import text as sa_text
import pymysql
date_list=[]
for i in range(0,2):
duration=datetime.today() - timedelta(days=i)
forma=duration.strftime("%m-%d")
date_list.append(forma)
print(date_list)
def curl_topic_url_hot():
url = 'https://www.xxxx.com/topiclist.php?f=397&p=1'
headers = {'User-Agent': 'aaaaaaaaaaaaaaa'}
response = requests.get(url, headers=headers)
tree = html.fromstring(response.text)
output = tree.xpath("//div[@class='pagination']/a[7]")
maxPage = int(output[0].text)
print('There are', maxPage, 'pages.')
return [maxPage]
topic_url_hot = curl_topic_url_hot()
Run Code Online (Sandbox Code Playgroud)
AWS 日志:
['02-12', '02-11']
Traceback (most recent call last):
File "/home/hadoop/ellen_crawl/test0211_mobile.py", line 167, in <module>
topic_url_hot = curl_topic_url_hot()
File "/home/hadoop/ellen_crawl/test0211_mobile.py", line 48, in curl_topic_url_hot
maxPage = int(output[0].text)
IndexError: list index out of range
Run Code Online (Sandbox Code Playgroud)
当我在 Jupyter 上运行此代码时,它显示:
['02-12', '02-11']
There are 818 pages.
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6204 次 |
| 最近记录: |