Elasticsearch: Move away from `oal.util.Bits` to represent documents that have a value

Created on 13 Apr 2017  路  5Comments  路  Source: elastic/elasticsearch

This is a remainder of the old doc values API, that used this abstraction to represent documents that have a value. Instead we should use an sequential API like advanceExact. See the FieldData.docsWithValue methods.

:SearcSearch >non-issue help wanted

All 5 comments

Hi @jpountz Can we start to work on this now? Thanks.

@liketic absolutely

Thanks @jpountz !
I've made some investigations about the usage of Bits. From what I understand, we need to create a new interface to replace Bits, which should have two method advanceExact() and length().
And also, I think the FieldData.docsWithValue methods can be also removed because we can wrap the original XXvalues directly. For example in class WithOrdinals:

            public interface DocValuesBits { // new interface

                boolean advanceExact(int index) throws IOException;

                int length();
            }

            public DocValuesBits docsWithValue(LeafReaderContext context) throws IOException {
                final SortedSetDocValues ordinals = ordinalsValues(context);
                final int maxDoc = context.reader().maxDoc();
                return new DocValuesBits() {
                    @Override
                    public boolean advanceExact(int index) throws IOException {
                        return ordinals.advanceExact(index);
                    }

                    @Override
                    public int length() {
                        return maxDoc;
                    }
                };
            }

If I'm wrong, could you please clarify? Really thanks for your kind responding. 馃憤

@liketic This would work, but I would favor returning a DocIdSetIterator rather than a DocValueBits instance (meaning the dv instances can be returned directly). We will need to modify call sites to replace boolean docHasValue = docsWithValue.get(docID) with something like

boolean currentdocID = docsWithValue.docID()
if (currentdocID < docID) {
  currentdocID = docsWithValue.advance(docID);
}
boolean docHasValue = currentdocID == docID;

Both options would work today, but if one day we need to iterate over all documents that have a value, the DocIdSetIterator approach will help skip more efficiently over documents that do not have a value instead of performing a linear scan.

@jpountz Thank you so much for you patient. I opened #28334 even though I don't have much confidence, especially for the nextDoc() method. Feel free to close it if it does not make sense. Looking forward to your reply. Thanks!

Was this page helpful?
0 / 5 - 0 ratings