Presto: Max query memory per node cannot be greater than the max query total memory per node

Created on 10 Jul 2018  Â·  13Comments  Â·  Source: prestodb/presto

presto version: 0.205

-Xmx25G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
....
.....
query.max-memory=28GB
query.max-memory-per-node=16GB
.....
  • free -g
                  total       used       free     shared    buffers     cached
Mem:            47         45          1          0            0           34
-/+ buffers/cache:         10         36
Swap:            7          0          7

In my work node, still have 36G memory. However query.max-memory-per-node=16GB get Max query memory per node cannot be greater than the max query total memory per node, if I set to 5G, It works.

Most helpful comment

The documentation is not up-to-date unfortunately, we need to update it -- sorry for that.

It's hard to give you exact numbers, because these numbers should be set based on your workloads. So, what I can do is to provide you some numbers to start with, and then you should experiment with these configs and your workloads to fine tune them.

Since you have 47G per node you can start with an Xmx of, say 35G, as you should set aside some overhead for the native memory and leave some room for the OS and other daemons running on the machines, if any. In production we use a G1 region size of 32M, which is also the documented value in the deployment docs.

Given that the max heap size is 35G, I think you can start experimenting with the following values and determine the right values for your workloads:

  • query.max-memory-per-node = 12GB
  • query.max-total-memory-per-node =15GB
  • memory.heap-headroom-per-node = 8GB (This is the amount of heap memory to set aside as headroom/buffer (e.g., for untracked allocations)).

With a headroom of 8G and a max total memory per node of 15G the general pool on each worker will be of size 35-8-15 = 12G, and that's 12G*11=132G in the entire cluster. When we determine the query.max-memory (the peak global user memory limit) we also consider the hash partition count (query.initial-hash-partitions configuration, which is the number of partitions for distributed joins and aggregations). Since you have 11 nodes, you can set query.initial-hash-partitions to 8, with that if we set query.max-memory to 48G that will result in 48/8=6GB per node memory usage roughly (if there is no skew and data is well distributed), and since we have query.max-memory-per-node of 12GB, that means we allow a skew factor of 12/6=2 (that is, we allow tasks to consume twice as much memory when the data is not well distributed). Again, you should definitely experiment and tune these values, and figure out what works for you.

To better understand the impact of these configs on the memory pools, please see how we calculate the size of the memory pools here.

All 13 comments

There is another configuration query.max-total-memory-per-node that sets the max total (user + system) memory per node, which must be greater than or equal to query.max-memory-per-node (which is only the user memory). The default value of query.max-total-memory-per-node is 30% of the heap size, which is ~7.5G for your config, that's why setting max memory to 5G works. If you set query.max-total-memory-per-node with a value greater than or equal to query.max-memory-per-node the error should go away.

@nezihyigitbasi

Thx, However I did not find this parameter(query.max-total-memory-per-node) in https://prestodb.io/docs/current/admin/properties.html#general-properties and https://prestodb.io/docs/current/installation/deployment.html

And I want to tuning presto, In my cluster(each node 47G mem * 11).
How to set these parameters is appropriate?

-Xmx
-XX:G1HeapRegionSize=

query.max-memory=
query.max-memory-per-node=
query.max-total-memory-per-node=

The documentation is not up-to-date unfortunately, we need to update it -- sorry for that.

It's hard to give you exact numbers, because these numbers should be set based on your workloads. So, what I can do is to provide you some numbers to start with, and then you should experiment with these configs and your workloads to fine tune them.

Since you have 47G per node you can start with an Xmx of, say 35G, as you should set aside some overhead for the native memory and leave some room for the OS and other daemons running on the machines, if any. In production we use a G1 region size of 32M, which is also the documented value in the deployment docs.

Given that the max heap size is 35G, I think you can start experimenting with the following values and determine the right values for your workloads:

  • query.max-memory-per-node = 12GB
  • query.max-total-memory-per-node =15GB
  • memory.heap-headroom-per-node = 8GB (This is the amount of heap memory to set aside as headroom/buffer (e.g., for untracked allocations)).

With a headroom of 8G and a max total memory per node of 15G the general pool on each worker will be of size 35-8-15 = 12G, and that's 12G*11=132G in the entire cluster. When we determine the query.max-memory (the peak global user memory limit) we also consider the hash partition count (query.initial-hash-partitions configuration, which is the number of partitions for distributed joins and aggregations). Since you have 11 nodes, you can set query.initial-hash-partitions to 8, with that if we set query.max-memory to 48G that will result in 48/8=6GB per node memory usage roughly (if there is no skew and data is well distributed), and since we have query.max-memory-per-node of 12GB, that means we allow a skew factor of 12/6=2 (that is, we allow tasks to consume twice as much memory when the data is not well distributed). Again, you should definitely experiment and tune these values, and figure out what works for you.

To better understand the impact of these configs on the memory pools, please see how we calculate the size of the memory pools here.

@nezihyigitbasi

Thx, It is very helpful.

@nezihyigitbasi can we have some earlier validation that query.max-total-memory-per-node >= query.max-memory-per-node ?

@findepi We have a check that does that on worker startup, please see this. Do you have something else in mind?

@nezihyigitbasi yes, i think this is the check that triggered the error message originally reported here. Apparently, it wasn't self-explanatory, but I am not sure how to improve. I tried https://github.com/prestodb/presto/pull/11040 but it's not perfect either. Please let me know what do you think.

@nezihyigitbasi : Do you mean query.max-total-memory-per-node does not include query.max-memory-per-node ??
They both take memory separately?

can you explain more on this? Because I have a similar problem.

@nezihyigitbasi : My second question is

memory.heap-headroom-per-node is a replacement for resources.reserved-system-memory ?? or they both exist ?

@ajantha-bhat

Do you mean query.max-total-memory-per-node does not include query.max-memory-per-node

The former should be greater than the latter as the former considers both the user + system memory reservation while the latter only considers the user memory reservation. More info in this PR: https://github.com/prestodb/presto/pull/11291

memory.heap-headroom-per-node is a replacement for resources.reserved-system-memory ?? or they both exist?

System pool is now disabled by default (can be enabled with a config), so when it's disabled resources.reserved-system-memory will not be used. The memory.heap-headroom-per-node config is only used when system memory pool is disabled, so these configs are mutually exclusive.

Thanks for the info. 1. So they both are not different memory? In above
discussion 15gb includes user 12gb?

2.Where I can learn about these presto system pool and reserved pool?

On Thu 16 Aug, 2018, 11:36 PM Nezih Yigitbasi, notifications@github.com
wrote:

@ajantha-bhat https://github.com/ajantha-bhat

Do you mean query.max-total-memory-per-node does not include
query.max-memory-per-node

The former should be greater than the latter as the former considers both
the user + system memory reservation while the latter only considers the
user memory reservation. More info in this PR: #11291
https://github.com/prestodb/presto/pull/11291

memory.heap-headroom-per-node is a replacement for
resources.reserved-system-memory ?? or they both exist?

System pool is now disabled by default (can be enabled with a config), so
when it's disabled resources.reserved-system-memory will not be used. The
memory.heap-headroom-per-node config is only used when system memory pool
is disabled, so these configs are mutually exclusive.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/prestodb/presto/issues/11005#issuecomment-413634721,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFndfKKBup69xX-j5FWBNYXbvdatdYH4ks5uRbS2gaJpZM4VItOc
.

  1. Yes, 15GB includes the 12GB. 12GB is just the user memory, and 15GB is the user memory and the system memory a query can reserve.
  2. You can check the previous questions in the user mailing list and study the code. You can start from LocalMemoryManager and MemoryPool classes and follow from there.

INFO main com.facebook.presto.server.PrestoServer ======== SERVER STARTED ========
ERROR Announcer-0 io.airlift.discovery.client.Announcer Cannot connect to discovery server for announce: Announcement failed with status code 404:
ERROR Announcer-0 io.airlift.discovery.client.Announcer Service announcement failed after 51.91ms. Next request will happen within 0.00s

I met like this, how to solve this problem?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

aminalaee picture aminalaee  Â·  3Comments

zsaltys picture zsaltys  Â·  4Comments

yaxxie picture yaxxie  Â·  4Comments

electrum picture electrum  Â·  4Comments

rajeshd3v picture rajeshd3v  Â·  3Comments