Azure-cosmos-dotnet-v2: very long ResponseContinuation on certain query

Created on 13 Oct 2015 · 25Comments · Source: Azure/azure-cosmos-dotnet-v2

If I run this query
SELECT * FROM c WHERE c.ObjectType="Document"
I get a
ResponseContinuation = "+RID:16gQALRRBgM-AAAAAAAAAA==#RT:1"

if I run the following query (on the same collection)
SELECT * FROM c WHERE c.CustomerAreaId = "1"
the ResponseContinuation is 7755 bytes long

ResponseContinuation = "+RID:16gQALRRBgPrggsAAAAAAA==#RT:1#FPC:AgEuPC6KBu8CAPDd9/3e9/f9//f3+3v1/v7/+//v9/e/f3//fn+rRsDf7//ff/W9///96Pt9b/u+f3///v/f7//7+//3/t/vv3///v/+//3v3/v+///Bfb/f37+///++/77fu7uv3e+3vW/+//+/v+2+//vf//7/t/b97/9v2/v+9/v79/f7/f7v7zvt/f37/d9/v/v9ff+/3+v+/vf3//f7/...

Is this ResponseContinuation correct or should it just look like that in the first example.
If I edit it to remove everything after the #RT:1 it appears to work but it is very difficult to be sure with the amount of data in the collection. If the ResponseContinuation is correct I will have to re-engineer my paging mechanism in my web site as this is far too much data to transfer

Source

Geetarman

Most helpful comment

Good luck. I think the term Continuation Token is a misnomer in this case - it is Continuation Data!

Geetarman on 15 Feb 2017

👍2 😄1

All 25 comments

The continuation token helps avoid redundant work in the future roundtrips. We persist information in this token so we know exactly where to resume without needing to repeat any work again in the future round trips. Overall, this significantly reduces RUs cumulative across the query roundtrips.

May I ask, what’s the concern about 7KB. It is within the supported boundary for HTTP headers.

ghost on 13 Oct 2015

Just that I am building a highly scalable website and I want to minimize the amount of data transferred. The request continuation is being transferred in the url as I assumed it was always small.. that was obviously a mistaken assumption on my part.

Geetarman on 13 Oct 2015

Fair enough. Perhaps there is something you can do to keep the token on your server in a session token perhaps instead of round-tripping it to the client? But yes, if you need to send this to the client, then not putting it in the URL would be best.

ghost on 13 Oct 2015

How big could that continuation token get.
Knowing that might help me decide the best solution. I might decide to store it in an unindexed collection.

Geetarman on 13 Oct 2015

Same problem here.
We have an issue to transfer this token from the client machine to our server that will then make the docdb request.

We first passed it to our server in a query string. But already it was too long sometimes and the request would fail. So we passed that continuation token using a custom header. It worked for a while. Prolly more than 6 months or so.

And today... I just stumbled on a 12.8kb token.
And now the request is just too long for the browser even using headers. See attached file.
continuationToken.txt

We also base64 encode this token so it can be transferred anywhere safely (since it's a json with some {} and stuff), and it becomes 17.1kb in base64.

I could create a 'short' token on our side, which would be associated to the real docdb token. But that means that for each docdb request, I'd have to query another system to get the real token...
And to me, it's not my job, but m$ job to give us users a token that we can easily manipulate.

Shall this issue be reopened?

Kevin-TokyWoky on 15 Feb 2017

👍3 😕1

For what it is worth I decided (really had no option) to store the continuation token in a cache object if it was greater that a certain length and replace it with a GUID. Therefore when the request comes in again I can determine if it is a real continuation token or cached upon which case I retrieve the 'true' continuation token. Currently the cache I use is a documentdb cache but you could use a table or Redis Cache.
I guess the problem is how long do you keep it. I keep a collection for caching purposes and it (auto) deletes them after a day (plenty enough time) and it works fine for me at the moment.
I use a dedicated unindexed collection for this purpose.

I would be nice though not to worry about it and remove the need to cache it myself...

Geetarman on 15 Feb 2017

Yes, I plan on using Redis. But first I'm trying to see if I can trim this token and only keep everything but the #FPC... I'm giving myself 30mins of hacking to try this :D

Kevin-TokyWoky on 15 Feb 2017

Good luck. I think the term Continuation Token is a misnomer in this case - it is Continuation Data!

Geetarman on 15 Feb 2017

👍2 😄1

Put a request on https://feedback.azure.com/forums/263030-documentdb
I'd vote for it....

Geetarman on 15 Feb 2017

Ahah yes.
Well at first glance this shorter token I just created, removing the long FCP part, works.
I gotta compare with what we have on our production server and make sure it's the same results as with this hacked token.

Kevin-TokyWoky on 15 Feb 2017

Hmmm, don't like the sound of that. If it isn't needed it must be a bug in documentdb in which case it should be reported. I have a vague feeling I tried something like that and had problems (it is a while ago)

Geetarman on 15 Feb 2017

Same... I don't like it. Especially since I would rely on their token's format... which could change any day.
:/
But ryancrawcour mentioned that there were persistent data inside this token to save them work for future roundtrips. Which makes us do more work instead. And that's probably inside this FCP part.

Well... I think I'll go the Redis way and generate our own token. Not sure yet.

Kevin-TokyWoky on 15 Feb 2017

@zfang will follow up on this issue.

rnagpal on 26 Feb 2017

Hello, thanks for reopening this.

So a bit of follow up since I almost forgot about this github issue:
following my conversation with Geetarman, I contacted the Azure support since we have a subscription.
I told my concerns about the token and I had a reply from Microsoft. See below:

For the query continuation token, it’s length could go up to 16KB. The query engine utilizes the token to serialize its state so that it could resume execution correctly. Along with the resume state, the query engine would also serialize some of the index lookup work on the continuation token to avoid repeating the same work for each continuation.
If this is really a blocking issue for you, then I could give you some hints on trimming the continuation token before sending it back. By all means we do not recommend this unless this is an absolute must and is meant to be a temporary solution.
From our side, we’re considering allowing the user to specify maximum continuation token length, with the caveat that if serializing the resume state did not fit in the specified max size, the query execution will fail with an error. We don’t have a timeline for this work yet though.

For the short term, you could trim the token by removing #FPC. Please keep in mind that in some cases you might get #FPP (i.e. either #FPC or #FPP).
We’ll sure prioritize this work item and hopefully we could get around to it soon.
Best Regards,

Very nice to see things going forward, +1 to Microsoft. They listen.

As for us, we are indeed trimming the token right now, but we only remove the #FPC part, as at the time I didn't know about the #FPP part. So far seems to work great, but I suspect it must cost us a little more data point in our DocDB subscription since we remove some optimization from the token. Probably.

Kevin-TokyWoky on 27 Feb 2017

👍2

@rnagpal it appears that our token is null even though we do have more results in our query and would need the continuation token to get the next results. We were using the method of stripping FPC and FPP.

"{"token":null,"range":{"min":"05C1E5D191B78A083134303331323800","max":"FF"}}"

Did anything change recently?

ansario on 3 Aug 2017

Yep. It changed like... one month ago or something.
I changed my token regex on June 1st. My commit reads:

Instead of FPC at the end there is now FPP

Here is the regex we are now using:

private static readonly Regex ContinuationTokenDataRegex = new Regex(@"(\+RID:.*#RT:.*#TRC:.*#RTD:.*)#[FPC|FPP].*", RegexOptions.Compiled | RegexOptions.Singleline);

BTW as a bonus, here is our code to shorten the tokens:

        private static string ShortenToken(string phatToken)
        {
            try
            {
                dynamic jsonToken = JsonConvert.DeserializeObject(phatToken);
                Match matches = ContinuationTokenDataRegex.Match((string)jsonToken.token);
                if (matches.Groups.Count != 2)
                {
                    return phatToken;
                }

                string shorterToken = matches.Groups[1].Value;
                jsonToken.token = shorterToken;
                return jsonToken.ToString();
            }
            catch (Exception ex)
            {
                return phatToken;
            }
        }

EDIT: However we still have a valid token field

Kevin-TokyWoky on 3 Aug 2017

😕2

@Kevin-TokyWoky that's fine, but the actual token is still coming back as null even though the response continuation itself is not null. So we can't do any regex on a null token.

ansario on 3 Aug 2017

Yep, I don't know. I just checked on our side and everything works properly.
I can paginate stuff, the token is not null for us.

Kevin-TokyWoky on 3 Aug 2017

@ansario could you please share the 'activity id' and we will try to take a look.
Also it would be good if you can create a new issue to track it.

kirankumarkolli on 15 Aug 2017

@ansario I am closing this issue as there is lot more other context as well. In-case you are still blocked feel free to raise a new issue.

kirankumarkolli on 17 Aug 2017

@Kevin-TokyWoky Thanks for your code example. I've had to change it a bit to get it to work in my code.

private static readonly Regex ContinuationTokenDataRegex = new Regex(@"(\+RID:.*#RT:.*#TRC:.*#RTD:.*)#[FPC|FPP].*", RegexOptions.Compiled | RegexOptions.Singleline);

private static string ShortenToken(string phatToken)
{
    if (string.IsNullOrEmpty(phatToken))
    {
        return phatToken;
    }

    try
    {
        dynamic jsonToken = JsonConvert.DeserializeObject(phatToken);
        var matches = ContinuationTokenDataRegex.Match((string)jsonToken.token);
        if (matches.Groups.Count != 2)
        {
            return phatToken;
        }

        jsonToken.token = matches.Groups[1].Value;
        return JsonConvert.SerializeObject(jsonToken);
    }
    catch
    {
        return phatToken;
    }
}

joopscheer on 29 Aug 2017

From our side, we’re considering allowing the user to specify maximum continuation token length.

This has since been implemented as the ResponseContinuationTokenLimitInKb on the FeedOptions object.

jamesthurley on 4 Sep 2019

This has since been implemented as the ResponseContinuationTokenLimitInKb on the FeedOptions object.

The original quote said (emphasis mine):

From our side, we’re considering allowing the user to specify maximum continuation token length, with the caveat that if serializing the resume state did not fit in the specified max size, the query execution will fail with an error. We don’t have a timeline for this work yet though.

So it's not really a solution. You can specify a max length for the token, but it will cause requests to fail...

thomaslevesque on 4 Sep 2019

@thomaslevesque Happily the way they have implemented this is that they simply prune the continuation token to keep it under the desired limit, rather than failing with an error.

The caveat is that resuming the query may take a bit more work (and therefore RUs) if the continuation token has been pruned.

There is a bit more information which I found useful here: https://stackoverflow.com/a/54242859/37725

jamesthurley on 4 Sep 2019

@jamesthurley good to know, thanks!
Too bad that the max length is expressed in KB, so we can't say e.g. "no more than 128 bytes". The "minimal" continuation token is only a few bytes, so there's still no easy way to get that...

thomaslevesque on 4 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings