Couchdb: Investigate how hard could it be to reintroduce etags in index's views.

Created on 17 Mar 2018  路  22Comments  路  Source: apache/couchdb

We've removed etag header from views in Couch 2.0. It should be possible to construct consistent etag on fabric level, and return 304 if the etag matches the header.

Expected Behavior

View should return consistent Etag header. View should respect 'if-none-match' header.

Current Behavior

View doesn't return ETag header

Possible Solution

Robert Newson's idea from COUCHDB-2859 :

Here's an implementation idea that should be no more expensive than what we do today;

  1. delay the response until we have heard from one shard of each range (I believe we already added this, but do check)
  2. from this, construct the etag (we can change the workers to return more information along with their first rows (if any)
  3. send 304 if the etag matches the header

References

api enhancement viewserver

Most helpful comment

@janl No, unfortunately, when view sig composed update_seq explicitly pulled from db and not taken from view's state. Ironically the problem here is opposite to the one you fixed, it's the fact that we _include_ update_seq into view and hence can get vary etags.

On the other note: I'm going to reopen this issue, I kept it as a frame to keep conversations separate from code review and have a common aggregate issue in case WIP approach wouldn't work and I'll close that PR in favor of something different.

All 22 comments

A side note: I don't think chttpd family of etag functions is actually working. Make ETag generates binary make_etag, but validator
etag_match/2 checks it versus tokenized header value, which gives a list of strings.

It's an easy fix, but I'm hesitant to introduce it as part of this work, it seems that etag_respond/3 used in quite a few places in chttpd, so complete fix would make this PR rather messy.

Good news - this is doable and not even overly esoteric. Bad news - the change is touching a lot of modules and needs a record change, so writing tests, migration code and then getting someone to review going to take a while.

Thanks @eiri, good to know. Let's not merge this into master until 2.2.0 is released, then.

I'm going to leave views' multi query without etags, at least in this iteration. I did a couple of attempts to implement them there and they all came out rather clunky _and_ with perf penalty.

Finished with _list though, going to do _show next and then wrap it up with adding tests.

Update: Evidently _show already respects the etags, so this change now just needs a record migration code and the tests.

@eiri any progress on this?

@wohali I thought we agreed not to move it forward until 2.2.0 is out? The actual work is done and I have a half of the tests completed locally, but then switched to a different work waiting for the release.

Yes agreed :) I just was curious if you'd finished the tests or if a branch was pushed with the work, that's all.

A quick note: evidently (haven't seen this myself, but was told that this is possible and been observed) the same range view shards on different nodes can have different update_seq. Since update_seq is a part of signature used to generate local etags it could lead to fluctuating etags based on what node responded first. :(

Need to reproduce this first and then ponder how signature can be changed to be consistent across nodes, but still be unique for a given request context to be useful for etag generation.

Hi @eiri,
Could you please elaborate why update_seq is a part of signature used to generate local etags? I'm just curious because from my (somewhat dilettante) point of view this sounds a bit redundant.

@AlexanderKaraberov Update sequence indicates how catched up view is to a database and we want unique etags for the different states of the view. I don't get it why you think that's redundant.

Codewise update_seq is part of view_sig (not to be confused with base sig) and then passed to preflight to generate etag.

indicates how catched up view is to a database

@eiri Thanks. I missed this obvious part 馃槃 .
So at the moment the only problem left with the implementation advised here in this PR is that (as you said) different nodes can have different update_seq and we don't have atomic broadcast in the couch cluster, right?

Yeah. I've implemented the idea in apache/couchdb#1237 with only missing bits of upgrade of #collector records for rolling reboots and the tests, but been distributed system it's quite possible to get response from lagging view before updated one and making delayed response to wait for response from _each_ shard in range is performance prohibitive.

Closing in favour of #1237

@eiri A quick note: evidently (haven't seen this myself, but was told that this is possible and been observed) the same range view shards on different nodes can have different update_seq.

@eiri maybe compaction was the culprit? We just fixed this here: https://github.com/apache/couchdb/pull/1434

@janl No, unfortunately, when view sig composed update_seq explicitly pulled from db and not taken from view's state. Ironically the problem here is opposite to the one you fixed, it's the fact that we _include_ update_seq into view and hence can get vary etags.

On the other note: I'm going to reopen this issue, I kept it as a frame to keep conversations separate from code review and have a common aggregate issue in case WIP approach wouldn't work and I'll close that PR in favor of something different.

It looks to me like the documentation for couchdb 2.1.2 (and probably other version's) views is joyfully out of sync with respect to presence of the etag response header.

I think it might have been nice if the etag mentions had been removed from the documentation as the same time the generation of those etags were being taken out of the code.
It took me a while to find this issue when I was pulling my hairs out over not seeing the etags I was yearning for.

I also did a cursory search for changelogs mentioning that the etags had been dropped. Was it ever mentioned? For me those mean the difference between an app being usable and unusable (I'm in a developing country with sub-sub-subpar networks).

@boerwastaken I'm sorry for your frustration. We are a volunteer project with limited resources, some things fall through the cracks.

This is a feature that was dropped with 2.0.0, see https://issues.apache.org/jira/browse/COUCHDB-2859 for the old bug tracker where this was first mentioned. There's even a bug filed against 1.6.1 where view etags don't work right, see https://issues.apache.org/jira/browse/COUCHDB-2880 for the report.

Pull requests to improve the documentation at https://github.com/apache/couchdb-documentation are always welcome.

Thanks @wohali. From my tests a month or so ago, against Couchdb 1.7.2 (non-clustered) I didn't run into any surprises wrt etags, they were behaving exactly as I needed them to behave. So I'm back on 1.7.2. Time to relax ;-)

@eiri any luck on this one or have you given up?

Hey @wohali thank you for the remainder, it totally slipped my mind!

Yeah, I had a discussion about this with @chewbranca some time ago and he was rather dismissive of the work on #1237 expressing strong believe that the approach there is invalid. I don't have a better idea on how to go about views' etags and I don't see any value on continuing with something that'd face pushback, so I gave up.

I'll close the both issues.

@eiri earlier in this ticket you said:

A side note: I don't think chttpd family of etag functions is actually working. Make ETag generates binary make_etag, but validator etag_match/2 checks it versus tokenized header value, which gives a list of strings.

It's an easy fix, but I'm hesitant to introduce it as part of this work, it seems that etag_respond/3 used in quite a few places in chttpd, so complete fix would make this PR rather messy.

Is that still true? Do you want me to open a new ticket for you on this?

@wohali Thank you for the remainder, I totally forgot about this! Yes, can you please open an issue for this, I'll get to it later this week.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JohnOllhorn picture JohnOllhorn  路  5Comments

DeylEnergy picture DeylEnergy  路  5Comments

wohali picture wohali  路  5Comments

wohali picture wohali  路  5Comments

wohali picture wohali  路  3Comments