Timescaledb: Order of insert

Created on 22 Apr 2018  路  2Comments  路  Source: timescale/timescaledb

Hi,

Is the order of insert into the hypertable an important issue to be optimized for SELECT queries? Is it best to keep the data inserted at a chronological order? The reason I'm asking is we are fetching data for different timeranges in parallel, do we need to ensure that the data to be sorted chronologically before bulk inserting into timescale? Or will the INDEX take care of this?

Thank you in advance.

Regards
Joe

question

All 2 comments

Data should be loosely ordered by time on insert, in order to achieve high insert rate (this has no effect on query performance), such that inserts are typically to the latest (or two) chunks.

For example, if you have a chunk interval of 1 day, you'd want to make sure that most of your inserts are to Apr 1, then Apr 2, then Apr 3, and so forth. If the interval is 1 hour, you want your inserts to mostly be to 12-1pm, then 1-2pm, etc. This ensures that the chunks being inserted to are largely in memory, and so you aren't constantly swapping back-and-forth to disk at insert time, which would happen if your insert's timestamps are randomly distributed over the year. A deeper explanation of why is given here: https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c

That said, no matter how the inserts are performed (either in-order or out-of-order), they are inserted in the correct chunk and, if an INDEX exists, the index is properly built over their timestamp. So your query (SELECT) performance isn't affected by the ordering of inserts.

thank you @mfreed , this clears things up immediately!

Was this page helpful?
0 / 5 - 0 ratings