将数百万个项目从一个存储帐户移动到另一个存储帐户

Dus*_*sda 5 c# parallel-processing azure azure-storage-blobs parallel.foreach

我需要从美国中北部移动到美国西部的420万张图像附近,作为利用Azure VM支持的大型迁移的一部分(对于那些不知道,美国中北部不支持的人)他们).图像都在一个容器中,分成大约119,000个目录.

我正在使用Copy Blob API中的以下内容:

public static void CopyBlobDirectory(
        CloudBlobDirectory srcDirectory,
        CloudBlobContainer destContainer)
{
    // get the SAS token to use for all blobs
    string blobToken = srcDirectory.Container.GetSharedAccessSignature(
        new SharedAccessBlobPolicy
        {
            Permissions = SharedAccessBlobPermissions.Read |
                            SharedAccessBlobPermissions.Write,
            SharedAccessExpiryTime = DateTime.UtcNow + TimeSpan.FromDays(14)
        });

    var srcBlobList = srcDirectory.ListBlobs(
        useFlatBlobListing: true,
        blobListingDetails: BlobListingDetails.None).ToList();

    foreach (var src in srcBlobList)
    {
        var srcBlob = src as ICloudBlob;

        // Create appropriate destination blob type to match the source blob
        ICloudBlob destBlob;
        if (srcBlob.Properties.BlobType == BlobType.BlockBlob)
            destBlob = destContainer.GetBlockBlobReference(srcBlob.Name);
        else
            destBlob = destContainer.GetPageBlobReference(srcBlob.Name);

        // copy using src blob as SAS
        destBlob.BeginStartCopyFromBlob(new Uri(srcBlob.Uri.AbsoluteUri + blobToken), null, null);          
    }
}
Run Code Online (Sandbox Code Playgroud)

问题是,它太慢了.Waaaay太慢了.按照发布命令复制所有这些东西的速度,它将需要在四天左右的某个地方.我不确定瓶颈是什么(连接限制客户端,Azure端的速率限制,多线程等).

所以,我想知道我的选择是什么.有什么方法可以加快速度,或者我只是坚持一份需要四天才能完成的工作?

编辑:我如何分配工作来复制一切

//set up tracing
InitTracer();

//grab a set of photos to benchmark this
var photos = PhotoHelper.GetAllPhotos().Take(500).ToList();

//account to copy from
var from = new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials(
    "oldAccount",
    "oldAccountKey");
var fromAcct = new CloudStorageAccount(from, true);
var fromClient = fromAcct.CreateCloudBlobClient();
var fromContainer = fromClient.GetContainerReference("userphotos");

//account to copy to
var to = new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials(
    "newAccount",
    "newAccountKey");
var toAcct = new CloudStorageAccount(to, true);
var toClient = toAcct.CreateCloudBlobClient();

Trace.WriteLine("Starting Copy: " + DateTime.UtcNow.ToString());

//enumerate sub directories, then move them to blob storage
//note: it doesn't care how high I set the Parallelism to,
//console output indicates it won't run more than five or so at a time
var plo = new ParallelOptions { MaxDegreeOfParallelism = 10 };
Parallel.ForEach(photos, plo, (info) =>
{
    CloudBlobDirectory fromDir = fromContainer.GetDirectoryReference(info.BuildingId.ToString());

    var toContainer = toClient.GetContainerReference(info.Id.ToString());
    toContainer.CreateIfNotExists();

    Trace.WriteLine(info.BuildingId + ": Starting copy, " + info.Photos.Length + " photos...");

    BlobHelper.CopyBlobDirectory(fromDir, toContainer, info);
    //this monitors the container, so I can restart any failed
    //copies if something goes wrong
    BlobHelper.MonitorCopy(toContainer);
});

Trace.WriteLine("Done: " + DateTime.UtcNow.ToString());
Run Code Online (Sandbox Code Playgroud)

Rob*_*rch 1

这有点遥远,但我在表存储方面也遇到了类似的问题,即小请求(我认为BeginStartCopyFromBlob应该是)开始运行得非常慢。这是Nagle 算法延迟 TCP 确认(网络流量的两种优化)的问题。有关更多详细信息,请参阅MSDN此人

结果 - 关闭 Nagle 算法 -在执行任何 Azure 存储操作之前调用以下命令。

ServicePointManager.UseNagleAlgorithm = false;
Run Code Online (Sandbox Code Playgroud)

或者只是斑点:

var storageAccount = CloudStorageAccount.Parse(connectionString);
ServicePoint blobServicePoint = ServicePointManager.FindServicePoint(account.BlobEndpoint);
blobServicePoint.UseNagleAlgorithm = false;
Run Code Online (Sandbox Code Playgroud)

很高兴知道这是否是您的问题!