Com*_*mpy 5 java azure azure-java-sdk azure-blob-storage
I need to list all of the blobs in an Azure Blobstorage container. The container has circa 200,000~ blobs in it, and I'm looking to obtain the blob name, the last modified date, and the blob size.
Following the documentation for the Azure Java SDK V12, the following code should work:
BlobServiceClient blobServiceClient = new BlobServiceClientBuilder().connectionString(AzureBlobConnectionString).buildClient();
String containerName = "container1";
BlobContainerClient containerClient = blobServiceClient.getBlobContainerClient(containerName);
System.out.println("\nListing blobs...");
// List the blob(s) in the container.
for (BlobItem blobItem : containerClient.listBlobs()) {
System.out.println("\t" + blobItem.getName());
}
Run Code Online (Sandbox Code Playgroud)
However, when executed this application just seems to hang indefinitely. If I open Powershell and run the following command:
Get-AzStorageBlob -Container container1 -Context $ctx
Run Code Online (Sandbox Code Playgroud)
I get the expected result within about 3 minutes.
I've given the code example upwards of an hour to execute, yet nothing comes of it. I attempted to restrict the data being requested as per the documentation, along with setting a 5 minute time out:
BlobServiceClient blobServiceClient = new BlobServiceClientBuilder().connectionString(AzureBlobConnectionString).buildClient();
String containerName = "container1";
BlobContainerClient containerClient = blobServiceClient.getBlobContainerClient(containerName);
System.out.println("\nListing blobs...");
ListBlobsOptions options = new ListBlobsOptions()
.setMaxResultsPerPage(10)
.setDetails(new BlobListDetails()
.setRetrieveDeletedBlobs(false)
.setRetrieveSnapshots(true));
Duration duration = Duration.ofMinutes(5);
containerClient.listBlobs(options, duration).forEach(blob ->
System.out.printf("Name: %s, Directory? %b, Deleted? %b, Snapshot ID: %s%n",
blob.getName(),
blob.isPrefix(),
blob.isDeleted(),
blob.getSnapshot()));
Run Code Online (Sandbox Code Playgroud)
However this resulted in it timing out with the exception:
Exception in thread "main" reactor.core.Exceptions$ReactiveException: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 300000ms in 'flatMap' (and no fallback has been configured)
at reactor.core.Exceptions.propagate(Exceptions.java:366)
at reactor.core.publisher.BlockingIterable$SubscriberIterator.hasNext(BlockingIterable.java:168)
at java.lang.Iterable.forEach(Iterable.java:74)
at AzureManagement.AzureControl.listAllBlobs(AzureControl.java:42)
at Main.main(Main.java:8)
Run Code Online (Sandbox Code Playgroud)
I understand there used to be a method called "listBlobsSegmented", however this does not appear to be in V12 of the Azure SDK for Java.
If anybody has any ideas as to how to get a list of the blobs in the container in an effective and efficient manner I would very much appreciate it!
Thanks.
我遇到了与任何操作永远挂起完全相同的问题。实际上,列出 blob 的方式没有问题。
原来是依赖冲突问题,请确保你的依赖与Azure SDK不存在冲突。这看起来很奇怪,但当我们将 Azure SDK 版本从 12 降级到旧版本时,我们发现了这一点,而不是挂起它并抛出类似的异常method not found in class ...
就我而言,冲突来自于hadoop-hdfs旧版本的netty. 虽然 Azure SDK 想要更新版本的netty.
当我删除 HDFS 依赖项时:
group: 'org.apache.hadoop', name: 'hadoop-hdfs', version: '3.2.0'
我可以列出文件和 blob,而不会出现挂起问题。