Hi all,
I've been trying to find more documentation around the space aspect of the time/space partitioning, specifically around what number_partitions I should use if I have N space values. Examples in the documentation often refer to location, but also use device_id and user_id, each of which could grow over time.
Here is an example from the API reference:
SELECT create_hypertable('conditions', 'time', 'location', 4);
My questions are
add_dimension: https://docs.timescale.com/latest/api#add_dimensionThat said, when using space partitions, we recommend using 1 space partition per disk.
- Is it safe to then say that
number_partitionsis tied to the underlying physical device? So what could be1in development could be4on production depending on physical configuration?- What value should it be when deploying to Timescale Cloud?
Thanks!
Space partitioning uses hashing, so it has nothing to do with the cardinality of the # of space values (e.g., N). Here, 4 space partitions for N values will map N/4 (in expectation) to each partition.
https://docs.timescale.com/latest/api#add_dimension
Space partitions use hashing: Every distinct item is hashed to one of N buckets. Remember that we are already using (flexible) time intervals to manage chunk sizes; the main purpose of space partitioning is to enable parallel I/O to the same time interval.
Timescale Cloud uses a single (EBS) volume per database, so doesn't typically benefit re: I/O from multiple space partitions.
If you do typically query by an additional "key" like user_id/device_id, we do recommend considering reordering policies. These are available on Timescale Cloud:
Most helpful comment
Space partitioning uses hashing, so it has nothing to do with the cardinality of the # of space values (e.g., N). Here, 4 space partitions for N values will map N/4 (in expectation) to each partition.
https://docs.timescale.com/latest/api#add_dimension
Space partitions use hashing: Every distinct item is hashed to one of N buckets. Remember that we are already using (flexible) time intervals to manage chunk sizes; the main purpose of space partitioning is to enable parallel I/O to the same time interval.Timescale Cloud uses a single (EBS) volume per database, so doesn't typically benefit re: I/O from multiple space partitions.
If you do typically query by an additional "key" like user_id/device_id, we do recommend considering reordering policies. These are available on Timescale Cloud:
https://docs.timescale.com/latest/api#add_reorder_policy