Azure-sdk-for-java: [BUG] BlobContainerClient.listBlobs follows continuation token in background without being asked for if just first item is requested.

Created on 23 Sep 2020  路  11Comments  路  Source: Azure/azure-sdk-for-java

Describe the bug
BlobContainerClient.listBlobs follows continuation token in background without being asked for.

To Reproduce
Steps to reproduce the behavior:

  1. Create blob container with lot of blobs (more than 5k) (or use smaller page size)
  2. Run the provided code sample

Code Snippet

package com.testfactory.apps;

import com.azure.core.http.policy.HttpLogDetailLevel;
import com.azure.core.util.Configuration;
import com.azure.storage.blob.*;
import com.azure.storage.common.StorageSharedKeyCredential;

import java.io.*;

public class App3
{
    private final static String ACCOUNT_NAME = "<your own>";
    private final static String ENDPOINT = "blob.core.windows.net";
    private final static String SHARED_KEY = "<your own>";
    private final static String CONTAINER = "<your own>";

    public static void main( String[] args ) throws IOException, InterruptedException {

        BlobServiceClient blobServiceClient = createServiceClient();

        blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().streamByPage().findFirst();

        Thread.sleep(10000);
    }

    private static BlobServiceClient createServiceClient() {
        return new BlobServiceClientBuilder()
                .endpoint("https://" + ACCOUNT_NAME + "." + ENDPOINT)
                .credential(new StorageSharedKeyCredential(ACCOUNT_NAME, SHARED_KEY))
                .httpLogOptions(
                        BlobServiceClientBuilder.getDefaultHttpLogOptions()
                                .setLogLevel(HttpLogDetailLevel.BODY_AND_HEADERS))
                .configuration(Configuration.getGlobalConfiguration().put("AZURE_LOG_LEVEL", "1"))
                .buildClient();
    }
}

Expected behavior
The API should send only one HTTP request and fetch first page

Output I got

2020-09-23 13:31 [parallel-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?restype=container&comp=list
(empty body)
--> END GET

2020-09-23 13:31 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?restype=container&comp=list (1429 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 13:31 [parallel-3] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITA3MDg3NWE0LTliNTgtNGEzYS1iZTk2LTQyMGUwYTE2OGExYi50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list
(empty body)
--> END GET

2020-09-23 13:31 [reactor-http-nio-2] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITA3MDg3NWE0LTliNTgtNGEzYS1iZTk2LTQyMGUwYTE2OGExYi50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list (542 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 13:31 [parallel-5] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITBlM2Q3NDIyLWYxYmMtNGU4Mi04N2JkLWQ0YmQ0NjQzYmQxMiEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list
(empty body)
--> END GET

2020-09-23 13:31 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITBlM2Q3NDIyLWYxYmMtNGU4Mi04N2JkLWQ0YmQ0NjQzYmQxMiEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list (352 ms, unknown-length body)
(body content not logged)
<-- END HTTP

Process finished with exit code 0

Screenshots
If applicable, add screenshots to help explain your problem.

Setup (please complete the following information):

  • OS: [e.g. iOS] Windows
  • IDE : [e.g. IntelliJ] IntelliJ
  • Version of the Library used :
  <dependencies>
    <dependency>
      <groupId>com.azure</groupId>
      <artifactId>azure-storage-blob</artifactId>
      <version>12.7.0</version>
      <scope>compile</scope>
    </dependency>
    <dependency>
      <groupId>com.azure</groupId>
      <artifactId>azure-storage-blob-batch</artifactId>
      <version>12.5.2</version>
      <scope>compile</scope>
    </dependency>
  </dependencies>
Azure.Core Client bug

Most helpful comment

@kasobol-msft, I can verify this with what I'm seeing in a simplified reproduction of the issue, the default stream/iterator over the T elements is making more calls than the paged API.

All 11 comments

I also tried the following code where I changed streamByPage() to stream().
This basically followed continuation token till end:

package com.testfactory.apps;

import com.azure.core.http.policy.HttpLogDetailLevel;
import com.azure.core.util.Configuration;
import com.azure.storage.blob.*;
import com.azure.storage.blob.models.ListBlobsOptions;
import com.azure.storage.common.StorageSharedKeyCredential;

import java.io.*;

public class App3
{
    private final static String ACCOUNT_NAME = "<your own>";
    private final static String ENDPOINT = "blob.core.windows.net";
    private final static String SHARED_KEY = "<your own>";
    private final static String CONTAINER = "<your own>";

    public static void main( String[] args ) throws IOException, InterruptedException {

        BlobServiceClient blobServiceClient = createServiceClient();

        blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().stream().findFirst();

        Thread.sleep(10000);
    }

    private static BlobServiceClient createServiceClient() {
        return new BlobServiceClientBuilder()
                .endpoint("https://" + ACCOUNT_NAME + "." + ENDPOINT)
                .credential(new StorageSharedKeyCredential(ACCOUNT_NAME, SHARED_KEY))
                .httpLogOptions(
                        BlobServiceClientBuilder.getDefaultHttpLogOptions()
                                .setLogLevel(HttpLogDetailLevel.BODY_AND_HEADERS))
                .configuration(Configuration.getGlobalConfiguration().put("AZURE_LOG_LEVEL", "1"))
                .buildClient();
    }
}

Output:

2020-09-23 14:17 [parallel-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?restype=container&comp=list
(empty body)
--> END GET

2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?restype=container&comp=list (1245 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-3] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITA3MDg3NWE0LTliNTgtNGEzYS1iZTk2LTQyMGUwYTE2OGExYi50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list
(empty body)
--> END GET

2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITA3MDg3NWE0LTliNTgtNGEzYS1iZTk2LTQyMGUwYTE2OGExYi50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list (648 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-5] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITBlM2Q3NDIyLWYxYmMtNGU4Mi04N2JkLWQ0YmQ0NjQzYmQxMiEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list
(empty body)
--> END GET

2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITBlM2Q3NDIyLWYxYmMtNGU4Mi04N2JkLWQ0YmQ0NjQzYmQxMiEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list (426 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-7] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITE1MWE2YzgyLTNmNTYtNDNlYS05MTRkLTRjMTM3NzhiM2RiZS50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list
(empty body)
--> END GET

2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITE1MWE2YzgyLTNmNTYtNDNlYS05MTRkLTRjMTM3NzhiM2RiZS50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list (527 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-9] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITFjNTBmMjhhLWVlZDgtNDhmYS1iYjg3LTFkODlmMmExYWRlYy50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list
(empty body)
--> END GET

2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITFjNTBmMjhhLWVlZDgtNDhmYS1iYjg3LTFkODlmMmExYWRlYy50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list (439 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-11] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITIzNmYwZjdlLTA3YzItNDE2Yi04YzBiLWQ3OTBmZWQ0ZmZhYSEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list
(empty body)
--> END GET

2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITIzNmYwZjdlLTA3YzItNDE2Yi04YzBiLWQ3OTBmZWQ0ZmZhYSEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list (358 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITJhOGYzNTQzLWI0MDYtNGZlNy1hYTFkLTJlMDVjNmQ5Y2Y4NyEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list
(empty body)
--> END GET

2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITJhOGYzNTQzLWI0MDYtNGZlNy1hYTFkLTJlMDVjNmQ5Y2Y4NyEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list (409 ms, unknown-length body)
(body content not logged)
<-- END HTTP

Process finished with exit code 0

Another observation:
I tried pure async client to get first item: asyncServiceClient.getBlobContainerAsyncClient(CONTAINER).listBlobs().blockFirst(); . This sent only one request as expected. So I suppose problem is somewhere in conversion of sync call to async call.

Thanks for submitting this issue @kasobol-msft, if possible could you also check what happens when using the Iterable variant of byPage in the synchronous API.

blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().iterableByPage().next();

Looking to see if this issue is with the PagedIterable design or only specifically in the streaming design.

@alzimmermsft I gave it a try and both blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().iterableByPage().iterator().next(); and blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().iterator().next(); result in extra requests being sent.

Thanks @kasobol-msft! We'll begin looking into mitigating this issue in azure-core's infrastructure.

Also, opened an issue in reactor-core to get expert's input on this: https://github.com/reactor/reactor-core/issues/2398

@anuchandy @alzimmermsft One more thing: When I changed from .listBlobs().streamByPage().findFirst(); to .listBlobs().stream().findFirst(); it made much more than 2-3 extra requests.

@kasobol-msft, I can verify this with what I'm seeing in a simplified reproduction of the issue, the default stream/iterator over the T elements is making more calls than the paged API.

This could be due to the flatmap default buffer-size. flatmap is used for flattening Page<T> to T.

If we change the sample-code we included in the previous reactor-issue_2398 to apply the flatmap, it makes ~256 mocked calls instead of ~3.

 final int TOTAL_PAGES = 500;
 final int BATCH_SIZE = 1;
 PageSource dataSource = new PageSource(TOTAL_PAGES);
 dataSource
         .getPageFlux()
         .flatMapIterable(page ->  page.items)
         .toStream(BATCH_SIZE)
         .findFirst();

ouput:

Mocked network call for Page_0
..
Mocked network call for Page_256

Once we apply a batch-size to the flatmap operator, like below, there call reduced to 2.

.flatMapIterable(page -> {
    return page.items;
}, 1); // <- 1 as batch-size
Mocked network call for Page_0
Mocked network call for Page_1

@kasobol-msft, PR https://github.com/Azure/azure-sdk-for-java/pull/15646 should have a tentative fix for this issue if you want to check it out.

@alzimmermsft looks like it's working.

Was this page helpful?
0 / 5 - 0 ratings