Timescaledb: Guidelines around needing N space partitions

Created on 17 Aug 2019  路  1Comment  路  Source: timescale/timescaledb

Hi all,
I've been trying to find more documentation around the space aspect of the time/space partitioning, specifically around what number_partitions I should use if I have N space values. Examples in the documentation often refer to location, but also use device_id and user_id, each of which could grow over time.

Here is an example from the API reference:

SELECT create_hypertable('conditions', 'time', 'location', 4);

My questions are

  1. Does this scale for N locations, users, or devices?
  2. Why 4? Further reading lead me to believe that this is because of a 1:1 ratio of partitions to disks, as shown in the documentation for add_dimension: https://docs.timescale.com/latest/api#add_dimension

That said, when using space partitions, we recommend using 1 space partition per disk.

  1. Is it safe to then say that number_partitions is tied to the underlying physical device? So what could be 1 in development could be 4 on production depending on physical configuration?
  2. What value should it be when deploying to Timescale Cloud?

Thanks!

question

Most helpful comment

Space partitioning uses hashing, so it has nothing to do with the cardinality of the # of space values (e.g., N). Here, 4 space partitions for N values will map N/4 (in expectation) to each partition.

https://docs.timescale.com/latest/api#add_dimension

Space partitions use hashing: Every distinct item is hashed to one of N buckets. Remember that we are already using (flexible) time intervals to manage chunk sizes; the main purpose of space partitioning is to enable parallel I/O to the same time interval.

Timescale Cloud uses a single (EBS) volume per database, so doesn't typically benefit re: I/O from multiple space partitions.

If you do typically query by an additional "key" like user_id/device_id, we do recommend considering reordering policies. These are available on Timescale Cloud:

https://docs.timescale.com/latest/api#add_reorder_policy

>All comments

Space partitioning uses hashing, so it has nothing to do with the cardinality of the # of space values (e.g., N). Here, 4 space partitions for N values will map N/4 (in expectation) to each partition.

https://docs.timescale.com/latest/api#add_dimension

Space partitions use hashing: Every distinct item is hashed to one of N buckets. Remember that we are already using (flexible) time intervals to manage chunk sizes; the main purpose of space partitioning is to enable parallel I/O to the same time interval.

Timescale Cloud uses a single (EBS) volume per database, so doesn't typically benefit re: I/O from multiple space partitions.

If you do typically query by an additional "key" like user_id/device_id, we do recommend considering reordering policies. These are available on Timescale Cloud:

https://docs.timescale.com/latest/api#add_reorder_policy

Was this page helpful?
0 / 5 - 0 ratings

Related issues

100milliongold picture 100milliongold  路  5Comments

zeeshanshabbir93 picture zeeshanshabbir93  路  3Comments

tkurki picture tkurki  路  3Comments

shane-axiom picture shane-axiom  路  4Comments

vfvgc picture vfvgc  路  4Comments