Timescaledb: Guidelines around needing N space partitions

Created on 17 Aug 2019 · 1Comment · Source: timescale/timescaledb

Hi all,
I've been trying to find more documentation around the space aspect of the time/space partitioning, specifically around what number_partitions I should use if I have N space values. Examples in the documentation often refer to location, but also use device_id and user_id, each of which could grow over time.

Here is an example from the API reference:

SELECT create_hypertable('conditions', 'time', 'location', 4);

My questions are

Does this scale for N locations, users, or devices?
Why 4? Further reading lead me to believe that this is because of a 1:1 ratio of partitions to disks, as shown in the documentation for add_dimension: https://docs.timescale.com/latest/api#add_dimension

That said, when using space partitions, we recommend using 1 space partition per disk.

Is it safe to then say that number_partitions is tied to the underlying physical device? So what could be 1 in development could be 4 on production depending on physical configuration?

What value should it be when deploying to Timescale Cloud?

Thanks!

question

Source

dimroc

❤2

Most helpful comment

Space partitioning uses hashing, so it has nothing to do with the cardinality of the # of space values (e.g., N). Here, 4 space partitions for N values will map N/4 (in expectation) to each partition.

https://docs.timescale.com/latest/api#add_dimension

Space partitions use hashing: Every distinct item is hashed to one of N buckets. Remember that we are already using (flexible) time intervals to manage chunk sizes; the main purpose of space partitioning is to enable parallel I/O to the same time interval.

Timescale Cloud uses a single (EBS) volume per database, so doesn't typically benefit re: I/O from multiple space partitions.

If you do typically query by an additional "key" like user_id/device_id, we do recommend considering reordering policies. These are available on Timescale Cloud:

https://docs.timescale.com/latest/api#add_reorder_policy

mfreed on 17 Aug 2019

👍2

>All comments

https://docs.timescale.com/latest/api#add_dimension

Timescale Cloud uses a single (EBS) volume per database, so doesn't typically benefit re: I/O from multiple space partitions.

If you do typically query by an additional "key" like user_id/device_id, we do recommend considering reordering policies. These are available on Timescale Cloud:

https://docs.timescale.com/latest/api#add_reorder_policy

mfreed on 17 Aug 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings