IndexError:使用 aws 时列出索引超出范围

Lar*_*a19 5 python amazon-web-services web-scraping

当我在 Jupyter 和虚拟机上运行这段代码时,它运行得很顺利。但是当我开始在AWS上运行时,它总是显示list index out of range。我想知道如何解决这个问题。谢谢!

代码:

from datetime import datetime, timedelta
from time import strptime
import requests
from lxml import html
import re
import time
import os
import sys

from pandas import DataFrame
import numpy as np
import pandas as pd

import sqlalchemy as sa
from sqlalchemy import create_engine
from sqlalchemy.sql import text as sa_text
import pymysql


date_list=[]
for i in range(0,2):
    duration=datetime.today() - timedelta(days=i)
    forma=duration.strftime("%m-%d")
    date_list.append(forma)

print(date_list)



def curl_topic_url_hot():
    url = 'https://www.xxxx.com/topiclist.php?f=397&p=1'
    headers = {'User-Agent': 'aaaaaaaaaaaaaaa'}
    response = requests.get(url, headers=headers)
    tree = html.fromstring(response.text)
    output = tree.xpath("//div[@class='pagination']/a[7]")
    maxPage = int(output[0].text)
    print('There are', maxPage, 'pages.')

    return [maxPage]

topic_url_hot = curl_topic_url_hot()
Run Code Online (Sandbox Code Playgroud)

AWS 日志:

['02-12', '02-11']
Traceback (most recent call last):
  File "/home/hadoop/ellen_crawl/test0211_mobile.py", line 167, in <module>
    topic_url_hot = curl_topic_url_hot()
  File "/home/hadoop/ellen_crawl/test0211_mobile.py", line 48, in curl_topic_url_hot
    maxPage = int(output[0].text)
IndexError: list index out of range
Run Code Online (Sandbox Code Playgroud)

当我在 Jupyter 上运行此代码时,它显示:

['02-12', '02-11']
There are 818 pages.
Run Code Online (Sandbox Code Playgroud)

小智 -2

您的AWS访问该网站并返回错误html,请检查它。 https://www.xxxx.com/topiclist.php?f=397&p=1