I have a java vertx micro service. One of the micro service will connect to Kafka to read messages, puts it on vertx event bus and the other reads from the event bus and process it further.
When I load tested this outside of nomad i.e just by starting java jars I did not face any issues. But when I run micro services via nomad some of the micro services works for couple of mins and then drops out of zookeeper cluster and stops working. In addition, vertx eventbus is unable to distribute load across cluster evenly.
In addition, I would like to know there a way to start all these process from normal root user and not chroot. I did user user option in task but it doesn't work.
If filing a bug please include the following:
Output from nomad version
Nomad v0.7.1 (0b295d399d00199cfab4621566babd25987ba06e)
RHEL 6.8
Vertx Micro services behaves differently with nomad.
Start zookeeper server
Start all micro services with nomad
Some of the micro services gets dropped out of zookeeper cluster while this issue doesn't happen without nomad.
No logs related to Java process in nomad server or client. It only logs that it connected to consul server.
Feb 07, 2018 1:03:40 PM io.vertx.core.impl.HAManager
WARNING: Timed out waiting for group information to appear
Feb 07, 2018 1:03:40 PM io.vertx.core.impl.HAManager
WARNING: Timed out waiting for group information to appear
Feb 07, 2018 1:03:40 PM io.vertx.core.impl.HAManager
WARNING: Timed out waiting for group information to appear
Feb 07, 2018 1:04:55 PM io.vertx.spi.cluster.zookeeper.impl.ZKAsyncMultiMap
WARNING: connection to the zookeeper server have suspended.
Feb 07, 2018 1:05:08 PM io.vertx.core.impl.HAManager
WARNING: Timed out waiting for group information to appear
EVERE: Failed to handle memberRemoved
io.vertx.core.VertxException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /io.vertx/syncMap/__vertx.haInfo/d213ca95-30de-4a15-a064-6d4abe744cfe
at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.get(ZKSyncMap.java:95)
at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.lambda$entrySet$4(ZKSyncMap.java:182)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1553)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at io.vertx.spi.cluster.zookeeper.impl.ZKSyncMap.entrySet(ZKSyncMap.java:184)
at io.vertx.core.impl.HAManager.nodeLeft(HAManager.java:321)
at io.vertx.core.impl.HAManager.access$100(HAManager.java:107)
at io.vertx.core.impl.HAManager$1.nodeLeft(HAManager.java:157)
job "VertxLoadTest" {
datacenters = ["dc1"]
type = "service"
update {
stagger = "10s"
max_parallel = 1
}
group "SimpleConsumer" {
count = 1
restart {
attempts = 1
interval = "5m"
delay = "25s"
mode = "delay"
}
task "SimpleKafka" {
driver = "java"
config {
jar_path = "tmp/vertxJars/SimpleKafka.jar"
jvm_options = ["-Xmx512m"]
args = ["syslog"]
}
artifact {
source = "http://somerepo/vertxJars/SimpleKafka.jar"
destination = "tmp/vertxJars/"
}
}
}
group "ParamsLoader" {
count = 1
restart {
attempts = 1
interval = "5m"
delay = "25s"
mode = "delay"
}
task "Params" {
driver = "java"
config {
jar_path = "tmp/vertxJars/Params.jar"
jvm_options = ["-Xmx512m"]
args = ["true"]
}
artifact {
source = "http://somerepo/vertxJars/Params.jar"
destination = "tmp/vertxJars/"
}
}
}
}
But when I run micro services via nomad some of the micro services works for couple of mins and then drops out of zookeeper cluster and stops working. In addition, vertx eventbus is unable to distribute load across cluster evenly.
You need to assign an amount of resources to your tasks: https://www.nomadproject.io/docs/job-specification/resources.html
They're getting a default set of resources which is too low.
In addition, I would like to know there a way to start all these process from normal root user and not chroot. I did user user option in task but it doesn't work.
In order to run processes as root you will need to set an empty user.blacklist in your Nomad client's configuration.
Hope that helps!
@schmichael Thanks for the answers. That helped.
I am running into resource issues. When I run outside nomad there is plenty of resource for the process but when I run it inside it halts with not enough memory and not enough cpu.
ID = 68d12feb
Name = servername
Class =
DC = dc1
Drain = false
Status = ready
Drivers =
Uptime = 1h33m45s
Allocated Resources
CPU Memory Disk IOPS
0/19976 MHz 0 B/31 GiB 0 B/41 GiB 0/0
Allocation Resource Utilization
CPU Memory
0/19976 MHz 0 B/31 GiB
Host Resource Utilization
CPU Memory Disk
0/19976 MHz 445 MiB/31 GiB 3.4 GiB/49 GiB
Allocations
No allocations placed
Job file:
job "VertxLoadTest" {
datacenters = ["dc1"]
type = "service"
update {
stagger = "10s"
max_parallel = 1
}
group "SimpleConsumer" {
count = 1
restart {
attempts = 1
interval = "5m"
delay = "25s"
mode = "delay"
}
task "SimpleKafka" {
driver = "java"
user = "root"
config {
jar_path = "tmp/vertxJars/SimpleKafka.jar"
jvm_options = ["-Xmx512m","-Xms256m"]
#args = ["syslogd"]
}
artifact {
source = "http://somerepo/vertxJars/SimpleKafka.jar"
destination = "tmp/vertxJars/"
}
}
}
group "ParamsLoader" {
count = 10
restart {
attempts = 1
interval = "5m"
delay = "25s"
}
task "Params" {
driver = "java"
user = "root"
config {
jar_path = "tmp/vertxJars/Params1.jar"
jvm_options = ["-Xmx512m"]
args = ["true"]
}
artifact {
source = "http://somerepo/vertxJars/Params1.jar"
destination = "tmp/vertxJars/"
}
resources {
cpu = 10000
memory = 16384
network {
}
}
}
}
}
Our system is 4 cores with 32GB Memory. But when I run the job. I get below error.
Evaluation triggered by job "VertxLoadTest"
Evaluation within deployment: "acecd9c1"
Allocation "4d24753d" created: node "68d12feb", group "ParamsLoader"
Allocation "8dfceee5" created: node "68d12feb", group "SimpleConsumer"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "eb11ac6b" finished with status "complete" but failed to place all allocations:
Task Group "ParamsLoader" (failed to place 9 allocations):
* Resources exhausted on 1 nodes
Dimension "cpu" exhausted on 1 nodes
Evaluation "a768363e" waiting for additional capacity to place remainder
Task Group "ParamsLoader" (failed to place 7 allocations):
* Resources exhausted on 1 nodes
Dimension "memory" exhausted on 1 nodes
Why via nomad I am not able leverage complete resource.
@schmichael I understood this. I was giving lot of resource per micro service. So it was running out of resources. Because the count is 10 for params task. So I reduced the resource for each task then it the calculation showed perfectly. Thanks for the help 馃挴
Glad you go it working @cernerpradeep!