Influxdb: DISTINCT does not operate on string fields

Created on 28 Apr 2016  路  10Comments  路  Source: influxdata/influxdb

Bug report

System info: [Include InfluxDB version, operating system name, and other relevant details]

InfluxDB 0.12.2 on Ubuntu 14.04, fresh install

Steps to reproduce:

  1. insert string fields
  2. query string field with DISTINCT

Expected behavior: [What you expected to happen]

Return from DISTINCT(<string>) would be the number of distinct strings.

Actual behavior: [What actually happened]

Return was null.

Additional info: [Include gist of relevant config, logs, etc.]

> show field keys from a
name: a
-------
fieldKey
string
value

> select * from a
name: a
-------
time            string  value
1461695896546713026 12  12
1461856121749763139 11  11
1461856127357529714 10  10
1461856128790092843 10  10
1461856129686020379 10  10
1461856131452838575 11  11
1461856136245585327 13  13

> select distinct(value) from a
name: a
-------
time    distinct
0   12
0   11
0   10
0   13

> select distinct(string) from a
> select count(string) from a
name: a
-------
time    count
0   7

> format json
> select distinct(string) from a
{"results":[{}]}
arefunctions

All 10 comments

@jsternberg ^^

@jsternberg I created the data with the following:

> insert a value=12,string="12"
> insert a value=11,string="11"
> insert a value=10,string="10"
> insert a value=10,string="10"
> insert a value=10,string="10"
> insert a value=11,string="11"
> insert a value=13,string="13"

@beckettsean I'm unable to reproduce this with 0.12.2. I'm using one of the pending Docker images to ensure I have a clean slate.

> create database mydb
> use mydb
Using database mydb
> insert a value=12,string="12"
> insert a value=11,string="11"
> insert a value=10,string="10"
> insert a value=10,string="10"
> insert a value=10,string="10"
> insert a value=11,string="11"
> insert a value=13,string="13"
> select * from a
name: a
-------
time                    string  value
1461857260244815974     12      12
1461857267066315206     11      11
1461857271181150496     10      10
1461857275369303495     10      10
1461857279493054542     10      10
1461857287331499229     11      11
1461857292976276325     13      13

> select distinct(value) from a
name: a
-------
time    distinct
0       12
0       11
0       10
0       13

> select distinct(string) from a
name: a
-------
time    distinct
0       12
0       11
0       10
0       13

> select count(string) from a
name: a
-------
time    count
0       7

> format json
> select distinct(string) from a
{"results":[{"series":[{"name":"a","columns":["time","distinct"],"values":[[0,"12"],[0,"11"],[0,"10"],[0,"13"]]}]}]}

Can you confirm you're using the correct version?

On checking, my system is actually 0.12.1. Is there reason to believe this is fixed on 0.12.2?

# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.12.1
InfluxDB shell 0.12.1
> 

Still happens for me on 0.12.2:

# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.12.2
InfluxDB shell 0.12.2
> use mydb
Using database mydb
> select distinct(value) from a
name: a
-------
time    distinct
0   12
0   11
0   10
0   13

> select distinct(string) from a
> 

Replicated on new data:

> insert b value=1,string="a"
> insert b value=2,string="b"
> insert b value=2,string="b"
> insert b value=1,string="a"
> insert b value=3,string="c"
> select * from b
name: b
-------
time            string  value
1461859961560359056 a   1
1461859965408355832 b   2
1461859966311601860 b   2
1461859967615202434 a   1
1461859972269952068 c   3

> select distinct(value) from b
name: b
-------
time    distinct
0   1
0   2
0   3

> select distinct(string) from b
> 

Not a syntax issue as far as I can tell:

> select distinct("string") from b
> select distinct('string') from b
ERR: error parsing query: expected field argument in distinct()
> 

I spun up a brand new instance, installed 0.12.2, and DISTINCT operates as expected:

# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.12.2
InfluxDB shell 0.12.2
> create database mydb
> user mydb
ERR: error parsing query: found USER, expected SELECT, DELETE, SHOW, CREATE, DROP, GRANT, REVOKE, ALTER, SET, KILL at line 1, char 1
Warning: It is possible this error is due to not setting a database.
Please set a database with the command "use <database>".
> use mydb
Using database mydb
> insert a number=1,string="a"
> insert a number=1,string="a"
> insert a number=2,string="b"
> insert a number=2,string="b"
> insert a number=3,string="c"
> select * from a
name: a
-------
time            number  string
1461860834329907105 1   a
1461860836969349116 1   a
1461860841620218718 2   b
1461860842520594123 2   b
1461860848010369577 3   c

> select distinct(number) from a
name: a
-------
time    distinct
0   1
0   2
0   3

> select distinct(string) from a
name: a
-------
time    distinct
0   a
0   b
0   c

I'm not sure what it is about my other installations that prevents this, but it appears that it cannot be reproduced on a brand new box.

I looked at the underlying data set and I think it's caused by a side-effect of a bad decision in the underlying query engine. If a shard returns no iterators for a query, it returns a fake float iterator. When casting is done, floats are given priority and all non-float iterators are discarded, making it appear like there's no data.

When selecting as a raw field, this doesn't happen since the work is done through auxiliary iterators which don't have this problem. I'll start working on a fix.

To be clear, this should happen with any aggregation function for anything lower than a float that has differing output types (it should not affect count()).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TechniclabErdmann picture TechniclabErdmann  路  80Comments

beckettsean picture beckettsean  路  83Comments

phemmer picture phemmer  路  60Comments

beckettsean picture beckettsean  路  43Comments

srfraser picture srfraser  路  90Comments