如何在Amazon S3 GetObject API中指定HTTP状态代码304(NotModified)不是错误条件?

Inv*_*ion 7 .net c# amazon-s3 amazon-web-services task-parallel-library

背景

我试图将S3用作一些"无限"的大型缓存层,用于某些"相当"的静态XML文档.我想确保客户端应用程序(将同时在数千台计算机上运行并每小时多次请求XML文档)仅在客户端应用程序上次下载内容时内容发生更改时才下载这些XML文档.

途径

在Amazon S3上,我们可以使用HTTP ETAG.默认情况下,Amazon S3对象将其ETAG设置为对象的MD5哈希值.

然后,我们可以在GetObjectRequest.ETagToNotMatch属性中指定XML文档的MD5哈希值.这确保了当我们进行AmazonS3.GetObject调用时(或者在我的情况下是异步版本AmazonS3.BeginGetObjectAmazonS3.EndGetObject),如果被请求的文档具有与GetObjectRequest.ETagToNotMatch当时包含的相同的MD5哈希,则S3会自动返回HTTP状态代码304(NotModified)和实际下载XML文档的内容.

问题

然而问题是,当调用AmazonS3.GetObject(或它是异步等效的)时,Amazon .Net API实际上将HTTP状态代码304(NotModified)视为错误,并且它重试get请求三次然后最终抛出一个Amazon.S3.AmazonS3Exception: Maximum number of retry attempts reached : 3.

显然我可以改变这个实现来使用AmazonS3.GetObjectMetaData然后比较ETAG并使用AmazonS3.GetObject它们是否匹配,但是当文件过时时,有两个请求而不是一个请求.无论XML文档是否需要下载,我都希望有一个请求.

有任何想法吗?这是一个错误还是我错过了什么?甚至有一些方法我可以将重试次数减少到一次并"处理"异常(尽管我对这条路线感到'y').

履行

我正在使用AWS SDK for .NET(版本1.3.14).

这是我的实现(稍微减少以保持更短):

public Task<GetObjectResponse> DownloadString(string key, string etag = null) {

    var request = new GetObjectRequest { Key = key, BucketName = Bucket };

    if (etag != null) {
        request.ETagToNotMatch = etag;
    }

    var task = Task<GetObjectResponse>.Factory.FromAsync(_s3Client.BeginGetObject, _s3Client.EndGetObject, request, null);

    return task;
}
Run Code Online (Sandbox Code Playgroud)

然后我称之为:

var dlTask          = s3Manager.DownloadString("new one", "d7db7bc318d6eb9222d728747879b52e");
var responseTasks   = new[]
    {
        dlTask.ContinueWith(x => _log.Error("Error downloading string.", x.Exception), TaskContinuationOptions.OnlyOnFaulted),
        dlTask.ContinueWith(x => _log.Warn("Downloading string was cancelled."), TaskContinuationOptions.OnlyOnCanceled),
        dlTask.ContinueWith(x => _log.Info(string.Format("Done with download: {0}", x.Result.ETag)), TaskContinuationOptions.OnlyOnRanToCompletion)
    };

try {
    Task.WaitAny(responseTasks);
} catch (AggregateException aex) {
    _log.Error("Error while processing download string.", aex);
}

_log.Info("Exiting...");
Run Code Online (Sandbox Code Playgroud)

然后生成此日志文件输出:

2011-10-11 13:21:20,376 [11] INFO  Amazon.S3.AmazonS3Client - Received response for GetObject (id 2ee99002-d148-4572-b19b-29259534f48f) with status code NotModified in 00:00:01.6140812.
2011-10-11 13:21:20,385 [11] INFO  Amazon.S3.AmazonS3Client - Request for GetObject is being redirect to https://s3.amazonaws.com/x/new%20one.
2011-10-11 13:21:20,789 [11] INFO  Amazon.S3.AmazonS3Client - Retry number 1 for request GetObject.
2011-10-11 13:21:22,329 [11] INFO  Amazon.S3.AmazonS3Client - Received response for GetObject (id 2ee99002-d148-4572-b19b-29259534f48f) with status code NotModified in 00:00:01.1400356.
2011-10-11 13:21:22,329 [11] INFO  Amazon.S3.AmazonS3Client - Request for GetObject is being redirect to https://s3.amazonaws.com/x/new%20one.
2011-10-11 13:21:23,929 [11] INFO  Amazon.S3.AmazonS3Client - Retry number 2 for request GetObject.
2011-10-11 13:21:26,508 [11] INFO  Amazon.S3.AmazonS3Client - Received response for GetObject (id 2ee99002-d148-4572-b19b-29259534f48f) with status code NotModified in 00:00:00.9790314.
2011-10-11 13:21:26,508 [11] INFO  Amazon.S3.AmazonS3Client - Request for GetObject is being redirect to https://s3.amazonaws.com/x/new%20one.
2011-10-11 13:21:32,908 [11] INFO  Amazon.S3.AmazonS3Client - Retry number 3 for request GetObject.
2011-10-11 13:21:40,604 [11] INFO  Amazon.S3.AmazonS3Client - Received response for GetObject (id 2ee99002-d148-4572-b19b-29259534f48f) with status code NotModified in 00:00:01.2950718.
2011-10-11 13:21:40,605 [11] INFO  Amazon.S3.AmazonS3Client - Request for GetObject is being redirect to https://s3.amazonaws.com/x/new%20one.
2011-10-11 13:21:40,621 [11] ERROR Amazon.S3.AmazonS3Client - Error for GetResponse
Amazon.S3.AmazonS3Exception: Maximum number of retry attempts reached : 3
   at Amazon.S3.AmazonS3Client.pauseOnRetry(Int32 retries, Int32 maxRetries, HttpStatusCode status, String requestAddr, WebHeaderCollection headers, Exception cause)
   at Amazon.S3.AmazonS3Client.handleHttpResponse[T](S3Request userRequest, HttpWebRequest request, HttpWebResponse httpResponse, Int32 retries, TimeSpan lengthOfRequest, T& response, Exception& cause, HttpStatusCode& statusCode)
   at Amazon.S3.AmazonS3Client.getResponseCallback[T](IAsyncResult result)
2011-10-11 13:21:40,635 [10] INFO  Example.Program - Exiting...
2011-10-11 13:21:40,638 [19] ERROR Example.Program - Error downloading string.
System.AggregateException: One or more errors occurred. ---> Amazon.S3.AmazonS3Exception: Maximum number of retry attempts reached : 3
   at Amazon.S3.AmazonS3Client.pauseOnRetry(Int32 retries, Int32 maxRetries, HttpStatusCode status, String requestAddr, WebHeaderCollection headers, Exception cause)
   at Amazon.S3.AmazonS3Client.handleHttpResponse[T](S3Request userRequest, HttpWebRequest request, HttpWebResponse httpResponse, Int32 retries, TimeSpan lengthOfRequest, T& response, Exception& cause, HttpStatusCode& statusCode)
   at Amazon.S3.AmazonS3Client.getResponseCallback[T](IAsyncResult result)
   at Amazon.S3.AmazonS3Client.endOperation[T](IAsyncResult result)
   at Amazon.S3.AmazonS3Client.EndGetObject(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endMethod, TaskCompletionSource`1 tcs)
   --- End of inner exception stack trace ---
---> (Inner Exception #0) Amazon.S3.AmazonS3Exception: Maximum number of retry attempts reached : 3
   at Amazon.S3.AmazonS3Client.pauseOnRetry(Int32 retries, Int32 maxRetries, HttpStatusCode status, String requestAddr, WebHeaderCollection headers, Exception cause)
   at Amazon.S3.AmazonS3Client.handleHttpResponse[T](S3Request userRequest, HttpWebRequest request, HttpWebResponse httpResponse, Int32 retries, TimeSpan lengthOfRequest, T& response, Exception& cause, HttpStatusCode& statusCode)
   at Amazon.S3.AmazonS3Client.getResponseCallback[T](IAsyncResult result)
   at Amazon.S3.AmazonS3Client.endOperation[T](IAsyncResult result)
   at Amazon.S3.AmazonS3Client.EndGetObject(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endMethod, TaskCompletionSource`1 tcs)<---
Run Code Online (Sandbox Code Playgroud)

Inv*_*ion 3

我也在亚马逊开发者论坛上发布了这个问题,并得到了AWS官方员工的回复:

经过调查后,我们了解了问题所在,但我们正在寻求有关如何最好地处理此问题的反馈。

第一种方法是让此操作返回,并在 GetObjectResponse 上返回一个属性,指示未返回对象或将输出流设置为 null。这对于代码来说会更干净,但对于任何依赖抛出异常的人来说,它确实会产生轻微的破坏行为,尽管是在 3 次重试之后。它也与 CopyObject 操作不一致,后者确实会抛出异常,而无需进行所有疯狂的重试。

另一种选择是我们抛出一个类似于 CopyObject 的异常,这使我们保持一致并且没有重大更改,但编码更困难。

如果有人对如何处理此问题有意见,请回复此帖子。

规范

我已经将我的想法添加到线程中,如果其他人有兴趣参与这里是链接:

AmazonS3.GetObject 将 HTTP 304 (NotModified) 视为错误。有办法允许吗?


注意:当亚马逊解决这个问题后,我将更新我的答案以反映结果。


更新: (2012-01-24) 仍在等待亚马逊的进一步信息。

更新: (2018-12-06) 此问题已在 2013 年的 AWS SDK 1.5.20 中修复https://forums.aws.amazon.com/thread.jspa?threadID=77995&tstart=0