sagemaker 批量转换在读取上游时因上游过早关闭连接而中断

Question

sagemaker 批量转换在读取上游时因上游过早关闭连接而中断

lea*_*rad 5 python nginx flask gunicorn amazon-sagemaker

我一直在尝试通过其批量转换服务让容器化机器学习模型在 AWS sagemaker 上运行，该服务将整个数据集分解为更小的数据集，以便从机器学习模型中进行推理。

该容器有一个 Flask 服务，它在后台运行带有 Gunicorn 和 nginx 的 ML 模型。在执行批量转换时，我收到 502 bad gateway 错误，日志上出现以下错误（当我运行具有 50k 数据集作为输入的同一容器时，它通过 c5.xlarge 实例传递，但当我在 80k 以下的相同情况下运行时失败）

*4 upstream prematurely closed connection while reading response header from 
upstream, client: IP, server: , request: "POST /invocations 
HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/invocations", host: 
"IP:8080"

"POST /invocations HTTP/1.1" 502 182 "-" "Apache-HttpClient/4.5.x (Java/1.8.0_172)"

Run Code Online (Sandbox Code Playgroud)

Nginx 配置

worker_processes 1;
daemon off; # Prevent forking
pid  /tmp/nginx.pid;
error_log /var/log/nginx/error.log;
events { defaults }
http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    access_log /var/log/nginx/access.log combined;

    upstream gunicorn {
        server unix:/tmp/gunicorn.sock;
    }

    server {
       listen 8080 deferred;
       client_max_body_size 5m;

       keepalive_timeout 10000;

       location ~ ^/(ping|invocations) {
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
           proxy_set_header Host $http_host;
           proxy_redirect off;
           proxy_pass http://gunicorn;
      }

     location / {
       return 404 "{}";
     }
  } 
}

Run Code Online (Sandbox Code Playgroud)

和gunicorn配置：

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionity/scikit_bring_your_own/container/decision_trees/serve

我对 nginx 和 Gunicorn 很陌生，并且已经阅读了大多数其他内容，因此在读取响应错误时上游过早关闭连接的帖子。我已经尝试过增加客户端尺寸等，但仍然遇到同样的错误。与此相关的一些帮助将会非常有帮助。

Answer 1

小智 5

这看起来像是一个 Gunicorn 工作超时。您可以根据模型服务推理请求所需的时间来调整两种超时设置：

Gunicorn工作线程超时，可以在此处调整：https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionity/scikit_bring_your_own/container/decision_trees/serve#L25
nginx proxy_read_timeout设置，可以在此处添加到 nginx.conf 中： https: //github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionity/scikit_bring_your_own/container/decision_trees/nginx.conf#L21- L37

如果您需要特定转换作业的支持，请访问 AWS 论坛：https://forums.aws.amazon.com/forum.jspa?forumID =285&start=0

归档时间：	7 年前
查看次数：	3875 次
最近记录：	6 年，11 月前

sagemaker 批量转换在读取上游时因上游​​过早关闭连接而中断

sagemaker 批量转换在读取上游时因上游过早关闭连接而中断