Describe the bug
BlobContainerClient.listBlobs follows continuation token in background without being asked for.
To Reproduce
Steps to reproduce the behavior:
Code Snippet
package com.testfactory.apps;
import com.azure.core.http.policy.HttpLogDetailLevel;
import com.azure.core.util.Configuration;
import com.azure.storage.blob.*;
import com.azure.storage.common.StorageSharedKeyCredential;
import java.io.*;
public class App3
{
private final static String ACCOUNT_NAME = "<your own>";
private final static String ENDPOINT = "blob.core.windows.net";
private final static String SHARED_KEY = "<your own>";
private final static String CONTAINER = "<your own>";
public static void main( String[] args ) throws IOException, InterruptedException {
BlobServiceClient blobServiceClient = createServiceClient();
blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().streamByPage().findFirst();
Thread.sleep(10000);
}
private static BlobServiceClient createServiceClient() {
return new BlobServiceClientBuilder()
.endpoint("https://" + ACCOUNT_NAME + "." + ENDPOINT)
.credential(new StorageSharedKeyCredential(ACCOUNT_NAME, SHARED_KEY))
.httpLogOptions(
BlobServiceClientBuilder.getDefaultHttpLogOptions()
.setLogLevel(HttpLogDetailLevel.BODY_AND_HEADERS))
.configuration(Configuration.getGlobalConfiguration().put("AZURE_LOG_LEVEL", "1"))
.buildClient();
}
}
Expected behavior
The API should send only one HTTP request and fetch first page
Output I got
2020-09-23 13:31 [parallel-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?restype=container&comp=list
(empty body)
--> END GET
2020-09-23 13:31 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?restype=container&comp=list (1429 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 13:31 [parallel-3] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITA3MDg3NWE0LTliNTgtNGEzYS1iZTk2LTQyMGUwYTE2OGExYi50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list
(empty body)
--> END GET
2020-09-23 13:31 [reactor-http-nio-2] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITA3MDg3NWE0LTliNTgtNGEzYS1iZTk2LTQyMGUwYTE2OGExYi50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list (542 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 13:31 [parallel-5] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITBlM2Q3NDIyLWYxYmMtNGU4Mi04N2JkLWQ0YmQ0NjQzYmQxMiEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list
(empty body)
--> END GET
2020-09-23 13:31 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITBlM2Q3NDIyLWYxYmMtNGU4Mi04N2JkLWQ0YmQ0NjQzYmQxMiEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list (352 ms, unknown-length body)
(body content not logged)
<-- END HTTP
Process finished with exit code 0
Screenshots
If applicable, add screenshots to help explain your problem.
Setup (please complete the following information):
<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob</artifactId>
<version>12.7.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob-batch</artifactId>
<version>12.5.2</version>
<scope>compile</scope>
</dependency>
</dependencies>
I also tried the following code where I changed streamByPage() to stream().
This basically followed continuation token till end:
package com.testfactory.apps;
import com.azure.core.http.policy.HttpLogDetailLevel;
import com.azure.core.util.Configuration;
import com.azure.storage.blob.*;
import com.azure.storage.blob.models.ListBlobsOptions;
import com.azure.storage.common.StorageSharedKeyCredential;
import java.io.*;
public class App3
{
private final static String ACCOUNT_NAME = "<your own>";
private final static String ENDPOINT = "blob.core.windows.net";
private final static String SHARED_KEY = "<your own>";
private final static String CONTAINER = "<your own>";
public static void main( String[] args ) throws IOException, InterruptedException {
BlobServiceClient blobServiceClient = createServiceClient();
blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().stream().findFirst();
Thread.sleep(10000);
}
private static BlobServiceClient createServiceClient() {
return new BlobServiceClientBuilder()
.endpoint("https://" + ACCOUNT_NAME + "." + ENDPOINT)
.credential(new StorageSharedKeyCredential(ACCOUNT_NAME, SHARED_KEY))
.httpLogOptions(
BlobServiceClientBuilder.getDefaultHttpLogOptions()
.setLogLevel(HttpLogDetailLevel.BODY_AND_HEADERS))
.configuration(Configuration.getGlobalConfiguration().put("AZURE_LOG_LEVEL", "1"))
.buildClient();
}
}
Output:
2020-09-23 14:17 [parallel-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?restype=container&comp=list
(empty body)
--> END GET
2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?restype=container&comp=list (1245 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-3] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITA3MDg3NWE0LTliNTgtNGEzYS1iZTk2LTQyMGUwYTE2OGExYi50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list
(empty body)
--> END GET
2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITA3MDg3NWE0LTliNTgtNGEzYS1iZTk2LTQyMGUwYTE2OGExYi50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list (648 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-5] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITBlM2Q3NDIyLWYxYmMtNGU4Mi04N2JkLWQ0YmQ0NjQzYmQxMiEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list
(empty body)
--> END GET
2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITBlM2Q3NDIyLWYxYmMtNGU4Mi04N2JkLWQ0YmQ0NjQzYmQxMiEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list (426 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-7] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITE1MWE2YzgyLTNmNTYtNDNlYS05MTRkLTRjMTM3NzhiM2RiZS50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list
(empty body)
--> END GET
2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITE1MWE2YzgyLTNmNTYtNDNlYS05MTRkLTRjMTM3NzhiM2RiZS50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list (527 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-9] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITFjNTBmMjhhLWVlZDgtNDhmYS1iYjg3LTFkODlmMmExYWRlYy50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list
(empty body)
--> END GET
2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21112%21MDAwMDQwITFjNTBmMjhhLWVlZDgtNDhmYS1iYjg3LTFkODlmMmExYWRlYy50eHQhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh&restype=container&comp=list (439 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-11] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITIzNmYwZjdlLTA3YzItNDE2Yi04YzBiLWQ3OTBmZWQ0ZmZhYSEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list
(empty body)
--> END GET
2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITIzNmYwZjdlLTA3YzItNDE2Yi04YzBiLWQ3OTBmZWQ0ZmZhYSEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list (358 ms, unknown-length body)
(body content not logged)
<-- END HTTP
2020-09-23 14:18 [parallel-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - --> GET https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITJhOGYzNTQzLWI0MDYtNGZlNy1hYTFkLTJlMDVjNmQ5Y2Y4NyEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list
(empty body)
--> END GET
2020-09-23 14:18 [reactor-http-nio-1] [INFO] com.azure.storage.blob.implementation.ContainersImpl$ContainersService.listBlobFlatSegment - <-- 200 https://xclientdev3.blob.core.windows.net/nettyrepro?marker=2%21108%21MDAwMDM2ITJhOGYzNTQzLWI0MDYtNGZlNy1hYTFkLTJlMDVjNmQ5Y2Y4NyEwMDAwMjghOTk5OS0xMi0zMVQyMzo1OTo1OS45OTk5OTk5WiE-&restype=container&comp=list (409 ms, unknown-length body)
(body content not logged)
<-- END HTTP
Process finished with exit code 0
Another observation:
I tried pure async client to get first item: asyncServiceClient.getBlobContainerAsyncClient(CONTAINER).listBlobs().blockFirst(); . This sent only one request as expected. So I suppose problem is somewhere in conversion of sync call to async call.
Thanks for submitting this issue @kasobol-msft, if possible could you also check what happens when using the Iterable variant of byPage in the synchronous API.
blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().iterableByPage().next();
Looking to see if this issue is with the PagedIterable design or only specifically in the streaming design.
@alzimmermsft I gave it a try and both blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().iterableByPage().iterator().next(); and blobServiceClient.getBlobContainerClient(CONTAINER).listBlobs().iterator().next(); result in extra requests being sent.
Thanks @kasobol-msft! We'll begin looking into mitigating this issue in azure-core's infrastructure.
Also, opened an issue in reactor-core to get expert's input on this: https://github.com/reactor/reactor-core/issues/2398
@anuchandy @alzimmermsft One more thing: When I changed from .listBlobs().streamByPage().findFirst(); to .listBlobs().stream().findFirst(); it made much more than 2-3 extra requests.
@kasobol-msft, I can verify this with what I'm seeing in a simplified reproduction of the issue, the default stream/iterator over the T elements is making more calls than the paged API.
This could be due to the flatmap default buffer-size. flatmap is used for flattening Page<T> to T.
If we change the sample-code we included in the previous reactor-issue_2398 to apply the flatmap, it makes ~256 mocked calls instead of ~3.
final int TOTAL_PAGES = 500;
final int BATCH_SIZE = 1;
PageSource dataSource = new PageSource(TOTAL_PAGES);
dataSource
.getPageFlux()
.flatMapIterable(page -> page.items)
.toStream(BATCH_SIZE)
.findFirst();
ouput:
Mocked network call for Page_0
..
Mocked network call for Page_256
Once we apply a batch-size to the flatmap operator, like below, there call reduced to 2.
.flatMapIterable(page -> {
return page.items;
}, 1); // <- 1 as batch-size
Mocked network call for Page_0
Mocked network call for Page_1
@kasobol-msft, PR https://github.com/Azure/azure-sdk-for-java/pull/15646 should have a tentative fix for this issue if you want to check it out.
@alzimmermsft looks like it's working.
Most helpful comment
@kasobol-msft, I can verify this with what I'm seeing in a simplified reproduction of the issue, the default stream/iterator over the
Telements is making more calls than the paged API.