I have one large measurement in my database, called log_record. This measurement stores sensor values for a lot of sensors. There are two retention policies:
1) four_weeks; contains four weeks of "raw" sensor data, about 10s interval
2) forever; contains averages over 2 minutes and keeps that data forever
Averages in the forever retention policy are calculated using this continuous query:
CREATE CONTINUOUS QUERY cq_aggregate_log_record ON mydb
BEGIN
SELECT mean(value) AS value
INTO mydb.forever.log_record
FROM mydb.four_weeks.log_record
GROUP BY time(2m), *
END
However, some series are not handled by this query, so their data is not averaged and saved in the forever retention policy, even though they have data in the four_weeks retention policy.
When I manually execute the exact same query, averages are correctly calculated for all series.
I have manually removed and added the continuous query to no avail.
I have also added a new continuous query, that specified a certain sensor_id, like this:
CREATE CONTINUOUS QUERY cq_aggregate_log_record ON mydb
BEGIN
SELECT mean(value) AS value
INTO mydb.forever.log_record
FROM mydb.four_weeks.log_record
WHERE sensor_id='4790'
GROUP BY time(2m), *
END
And this continuous query also does not work, so averages are not calculated, even when specifying a value for the sensor_id.
__System info:__
InfluxDB 1.1
Ubuntu 16.10
__Steps to reproduce:__
Hard to say...
__Expected behavior:__
Continuous query should calculate averages over 2 minute periods for every series in the measurement.
__Actual behavior:__
Averages are only calculated for about 70 or 80% of series, rest isn't handled at all.
Note that because this is happening in a production environment, I've currently removed the continuous query from InfluxDB, and replaced it with a celery task that executes the query every two minutes instead.
Is there anything I can do to debug this?
I recently moved to a kapacitor batch aggregation script (as defined in the kapacitor docs), and it behaves the same. Only about 2/3 of my series are actually processed and aggregated by the kapacitor task.
@jaapz We are grooming old continuous query defects and are curious if this is still a problem for you? If so, can you please update this defect? We'd like to work with you to resolve the error if it is still reproducible on a more recent version of influxdb.
I'm going to have to reproduce it on the latest release, but I need to find some time for that. Currently we fell back to using an external (celery) task which runs the same query through a task it runs periodically, which works for now. I'll add a todo for myself and get back a soon as I find the time (I hope soon).
@jaapz have you managed to test it? I'm exploring continuous queries for a tag-rich system, so any information on potential problems would be invaluable!
@lukaszdudek-silvair I think we did test it on a more recent version and still had the same problem, not sure which version it was. We're waiting on InfluxDB 2.0 before we're looking at revisiting this. The external task executing the queries works well for us.
I can confirm that the problem is still occurring with version 1.7.4.
Just as an update: I have a suspicion that the cause might actually be that InfluxDB is dropping points when it has to ingest a huge amount of points at once. We are having some problems currently again with InfluxDB dropping points when we are writing tens of thousands of points per second.
@rbetts do you think that might be possible?
EDIT: I think our continuous query should be writing about 37.500 averaged points to the database on each run. Weird thing is that it does work when going through the /query HTTP API endpoint.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
It's still a problem in 1.7.x
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
That's one way to clean up issues
Most helpful comment
That's one way to clean up issues