使用 Java Apache HttpClient 4.5.12 时如何解决“连接重置”

Dev*_*Ltd 6 java httpclient apache-commons-httpclient apache-httpclient-4.x

我们一直在与我们的数据提供者之一讨论来自我们的 HTTP 请求的某些请求由于“连接重置”异常而间歇性失败的问题,但我们也看到了“目标服务器无法响应”异常。

许多 Stack Overflow 帖子都指出了一些潜在的解决方案,即

我希望这个问题能帮助我找到根本原因。

语境

它是托管在 AWS Elastic Beanstalk 中的 Java Web 应用程序,具有基于负载的 2..4 个服务器。Java WAR 文件使用 HttpClient 4.5.12 进行通信。在过去的几个月里,我们看到

45 x 连接重置(只有 3 个超时超过 30 秒,其他在 20 毫秒内失败)

考虑到这一点,我们对该供应商执行了 10,000 个请求,因此错误率并不高,但非常不方便,因为我们的客户为随后失败的服务付费。

目前,我们正试图专注于消除“连接重置”场景,并建议我们尝试以下方法:

1)重启我们的应用服务器(一个绝望的以防万一的场景)

2) 更改 DNS 服务器以使用 Google 8.8.8.8 & 8.8.4.4(因此我们的请求采用不同的路径)

3) 为每个服务器分配一个静态 IP(这样它们就可以使我们无需通过他们的 CloudFront 分配即可进行通信)

我们将通过这些建议,但同时我想了解我们的 HttpClient 实现可能不太正确的地方。

典型用法

用户请求 --> 我们的服务器(JAX-RS 请求) --> HttpClient 到第 3 方 --> 收到响应,例如 JSON/XML --> 发回按摩响应(我们的 JSON 格式)

技术细节

在 64 位 Amazon Linux 上运行带有 Java 8 的 Tomcat 8

4.5.12 HttpClient 4.4.13 HttpCore <-- Maven 依赖显示 HttpClient 4.5.12 需要 4.4.13 4.5.12 HttpMime

通常,HTTP 请求将花费 200 毫秒到 10 秒之间的任何时间,根据我们正在调用的 API,超时设置在 15-30 秒左右。我还使用了一个连接池,并且考虑到大多数请求应该在 30 秒内完成,我觉得驱逐超过该时间两倍的任何东西是安全的。

任何关于这些值是否合理的建议表示赞赏。

// max 200 requests in the connection pool
CONNECTIONS_MAX = 200;

// each 3rd party API can only use up to 50, so worst case 4 APIs can be flooded before exhuasted
CONNECTIONS_MAX_PER_ROUTE = 50;

// as our timeouts are typically 30s I'm assuming it's safe to clean up connections
// that are double that

// Connection timeouts are 30s, wasn't sure whether to close 31s or wait 2xtypical = 60s
CONNECTION_CLOSE_IDLE_MS = 60000;

// If the connection hasn't been used for 60s then we aren't busy and we can remove from the connection pool
CONNECTION_EVICT_IDLE_MS = 60000;

// Is this per request or each packet, but all requests should finish within 30s
CONNECTION_TIME_TO_LIVE_MS = 60000;

// To ensure connections are validated if in the pool but hasn't been used for at least 500ms
CONNECTION_VALIDATE_AFTER_INACTIVITY_MS = 500; // WAS 30000 (not test 500ms yet)
Run Code Online (Sandbox Code Playgroud)

此外,我们倾向于将三个超时设置为 30 秒,但我相信我们可以微调这些......

// client tries to connect to the server. This denotes the time elapsed before the connection established or Server responded to connection request.
// The time to establish a connection with the remote host
.setConnectTimeout(...) // typical 30s - I guess this could be 5s (if we can't connect by then the remote server is stuffed/busy)

// Used when requesting a connection from the connection manager (pooling)
// The time to fetch a connection from the connection pool
.setConnectionRequestTimeout(...) // typical 30s - I guess only applicable if our pool is saturated, then this means how long to wait to get a connection?

// After establishing the connection, the client socket waits for response after sending the request. 
// This is the time of inactivity to wait for packets to arrive
.setSocketTimeout(...) // typical 30s - I believe this is the main one that we care about, if we don't get our payload in 30s then give up
Run Code Online (Sandbox Code Playgroud)

我复制并粘贴了我们用于所有 GET/POST 请求的主要代码,但删除了不重要的方面,例如我们的重试逻辑、预缓存和后缓存

我们使用单个 PoolingHttpClientConnectionManager 和单个 CloseableHttpClient,它们的配置如下...

    private static PoolingHttpClientConnectionManager createConnectionManager() {
        PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();

        cm.setMaxTotal(CONNECTIONS_MAX); // 200
        cm.setDefaultMaxPerRoute(CONNECTIONS_MAX_PER_ROUTE); // 50
        cm.setValidateAfterInactivity(CONNECTION_VALIDATE_AFTER_INACTIVITY_MS); // Was 30000 now 500

        return cm;
    }
Run Code Online (Sandbox Code Playgroud)
    private static CloseableHttpClient createHttpClient() {

        httpClient = HttpClientBuilder.create()
                .setConnectionManager(cm)
                .disableAutomaticRetries() // our code does the retries
                .evictIdleConnections(CONNECTION_EVICT_IDLE_MS, TimeUnit.MILLISECONDS) // 60000
                .setConnectionTimeToLive(CONNECTION_TIME_TO_LIVE_MS, TimeUnit.MILLISECONDS) // 60000
                .setRedirectStrategy(LaxRedirectStrategy.INSTANCE)
                // .setKeepAliveStrategy() - The default implementation looks solely at the 'Keep-Alive' header's timeout token.
                .build();
        return httpClient;
    }

Run Code Online (Sandbox Code Playgroud)

每分钟我都有一个线程试图获取连接

    public static PoolStats performIdleConnectionReaper(Object source) {
        synchronized (source) {
            final PoolStats totalStats = cm.getTotalStats();
            Log.info(source, "max:" + totalStats.getMax() + " avail:" + totalStats.getAvailable() + " leased:" + totalStats.getLeased() + " pending:" + totalStats.getPending());
            cm.closeExpiredConnections();
            cm.closeIdleConnections(CONNECTION_CLOSE_IDLE_MS, TimeUnit.MILLISECONDS); // 60000
            return totalStats;
        }
    }
Run Code Online (Sandbox Code Playgroud)

这是执行所有 HttpClient GET/POST 的自定义方法,它执行统计、预缓存、后缓存和其他有用的东西,但我已经剥离了所有这些,这是为每个请求执行的典型大纲。我尝试按照 HttpClient 文档遵循模式,该文档告诉您使用实体并关闭响应。注意我没有关闭 httpClient,因为所有请求都使用一个实例。

    public static HttpHelperResponse execute(HttpHelperParams params) {

        boolean abortRetries = false;

        while (!abortRetries && ret.getAttempts() <= params.getMaxRetries()) {

            // 1 Create HttpClient
            // This is done once in the static init CloseableHttpClient httpClient = createHttpClient(params);

            // 2 Create one of the methods, e.g. HttpGet / HttpPost - Note this also adds HTTP headers 
            // (see separate method below)
            HttpRequestBase request = createRequest(params);

            // 3 Tell HTTP Client to execute the command
            CloseableHttpResponse response = null;
            HttpEntity entity = null;
            boolean alreadyStreamed = false;

            try {

                response = httpClient.execute(request);
                if (response == null) {
                    throw new Exception("Null response received");
                } else {

                    final StatusLine statusLine = response.getStatusLine();
                    ret.setStatusCode(statusLine.getStatusCode());
                    ret.setReasonPhrase(statusLine.getReasonPhrase());

                    if (ret.getStatusCode() == 429) {
                        try {
                            final int delay = (int) (Math.random() * params.getRetryDelayMs());
                            Thread.sleep(500 + delay); // minimum 500ms + random amount up to delay specified
                        } catch (Exception e) {
                            Log.error(false, params.getSource(), "HttpHelper Rate-limit sleep exception", e, params);
                        }
                    } else {

                        // 4 Read the response
                        // 6 Deal with the response
                        // do something useful with the response body                        
                        entity = response.getEntity();

                        if (entity == null) {
                            throw new Exception("Null entity received");
                        } else {
                            ret.setRawResponseAsString(EntityUtils.toString(entity, params.getEncoding()));
                            ret.setSuccess();
                            if (response.getAllHeaders() != null) {
                                for (Header header : response.getAllHeaders()) {
                                    ret.addResponseHeader(header.getName(), header.getValue());
                                }
                            }
                        }

                    }
                }

            } catch (Exception ex) {

                if (ret.getAttempts() >= params.getMaxRetries()) {
                    Log.error(false, params.getSource(), ex);
                } else {
                    Log.warn(params.getSource(), ex.getMessage());
                }

                ret.setError(ex); // If we subsequently get a response then the error will be cleared.                
            } finally {

                ret.incrementAttempts();

                // Any HTTP 2xx are considered successfull, so stop retrying, or if
                // a specifc HTTP code has been passed to stop retring
                if (ret.getStatusCode() >= 200 && ret.getStatusCode() <= 299) {
                    abortRetries = true;
                } else if (params.getDoNotRetryStatusCodes().contains(ret.getStatusCode())) {
                    abortRetries = true;
                }

                if (entity != null) {
                    try {
                        // and ensure it is fully consumed - hand it back to the pool
                        EntityUtils.consume(entity);
                    } catch (IOException ex) {
                        Log.error(false, params.getSource(), "HttpHelper Was unable to consume entity", params);
                    }

                }

                if (response != null) {
                    try {
                        // The underlying HTTP connection is still held by the response object
                        // to allow the response content to be streamed directly from the network socket.
                        // In order to ensure correct deallocation of system resources
                        // the user MUST call CloseableHttpResponse#close() from a finally clause.
                        // Please note that if response content is not fully consumed the underlying
                        // connection cannot be safely re-used and will be shut down and discarded
                        // by the connection manager.                     
                        response.close();
                    } catch (IOException ex) {
                        Log.error(false, params.getSource(), "HttpHelper Was unable to close a response", params);
                    }
                }

                // When using connection pooling we don't want to close the client, otherwise the connection
                // pool will also be closed
                //                if (httpClient != null) {
                //                    try {
                //                        httpClient.close();
                //                    } catch (IOException ex) {
                //                        Log.error(false, params.getSource(), "HttpHelper Was unable to close httpClient", params);
                //                    }
                //                }


            }
        }

        return ret;
    }
Run Code Online (Sandbox Code Playgroud)
    private static HttpRequestBase createRequest(HttpHelperParams params) {

        ...
        request.setConfig(RequestConfig.copy(RequestConfig.DEFAULT)
            // client tries to connect to the server. This denotes the time elapsed before the connection established or Server responded to connection request.
            // The time to establish a connection with the remote host
            .setConnectTimeout(...) // typical 30s

            // Used when requesting a connection from the connection manager (pooling)
            // The time to fetch a connection from the connection pool
            .setConnectionRequestTimeout(...) // typical 30s

            // After establishing the connection, the client socket waits for response after sending the request. 
            // This is the time of inactivity to wait for packets to arrive
            .setSocketTimeout(...) // typical 30s

            .build()
        );

        return request;
    }
Run Code Online (Sandbox Code Playgroud)