We are using thingsboard 2.0.3 with a Cassandra database
How to reproduce the error :
Result : the service takes a large amount of time to respond, and end with the following time out error : 408 {"error":"Timeout during persistence of the message to the queue!"}
Here is our log files :
thingsboard.log
thingsboard.out.log
We didn't changed the Rule Chains. We had previous rules before migration from v1.4 to v2.0.3 but they are automatically delete if we understood well the migration process.
Here are screenshots of our rule chains :


Thingsboard and Cassandra are running on the same instance : a Virtual Machine with 8cpu and 16Go RAM

You can check the java options we set for those processes :
javaOptions.txt
Just for information, when we were posting the same telemetries with thingsboard 1.4 we didn't had this issue. Our database has around 1500 devices
We are facing a similar problem. In version 1.4.x, we already had a problem of slowness which produced the freeze of the whole web service. We migrated from Postgres to Cassandra, which solved the problem.
We have just migrated to version 2.x to take advantage of the latest advances with rules chain. Unfortunately all insertion are now very slow (on the REST api - more than 10x slower), and ultimately produce a complete freeze of the web application.
Any solutions? Maybe it's just a configuration problem.
Same problem here (with 2.1.0 and may be earlier), using MQTT instead of REST.
Similar problem here.
Devices are storing large amount of telemetry messages stored in client's persistent storage while in disconnected state (using PAHO MQTT C client library). When the device client re-connects to the TB MQTT, the first few hundred persistent messages are sent successfully to the server and removed from the client's persistent storage. The remaining messages will be answered by TB with
{"error":"Timeout during persistence of the message to the queue!"}
and will not be removed from the client's persistent storage (which is correct and good!).
During next re-connect, the next couple of hundreds messages can be successfully transferred...
Can be quite a lengthly process to get rid of the messages, but at least no data ist lost.
Any thoughts on how to avoid the timeout on the server side?
Short answer:
Please increase actors.rule.queue.max_size property in the thingsboard.yml file to a higher value. For example, set it to 10000.
Long answer:
When device submit messages to the Thingsbaord (telemetry/attributes/RPC), by default, all messages are saved in the Queue. After messages are saved, they are passed to the Rule Engine for processing and response is generated for the Device. As you see, processing inside Rule Engine is asynchronous and we need to guarantee that if a device receives '200 OK' from the Thingsboard, the message will be processed.
The queue is cleared after message processing is finished.
When a batch of messages is submitted it is saved in the queue as N separate messages.
We limit the concurrent number of messages in the Queue for single Tenant. Default max value is 100.
So when 2 batches are submitted from a single tenant with the size for each batch = 60. Message queue will reject some part of those messages and the device will receive an error.
Thanks for the excellent explanation!
Appreciate it!
Thank you for your guide, but it's still a problem!
More details can help with investigation
@vparomskiy Is there any way to observe the Queue's state? Also, can we change the default max value for the concurrent messages per Tenant via configuration or is this not exposed?
In the current version, it is not possible.
contributions are welcome
Increasing the actors.rule.queue.max_size did not help. I get the following error...
2018-11-21 05:05:52,380 [pool-23-thread-6] WARN o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: -104, SQLState: 23505
2018-11-21 05:05:52,381 [pool-23-thread-6] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - integrity constraint violation: unique constraint or index violation; TS_KV_UNQ_KEY table: TS_KV
Attached is the thingsboard.log file.
Any suggestions?
It looks like your system is not able to handle generated load, so I have a couple of questions:
This thread is not active anymore. Closing an issue.
Short answer:
Please increase actors.rule.queue.max_size property in the thingsboard.yml file to a higher value. For example, set it to 10000.Long answer:
When device submit messages to the Thingsbaord (telemetry/attributes/RPC), by default, all messages are saved in the Queue. After messages are saved, they are passed to the Rule Engine for processing and response is generated for the Device. As you see, processing inside Rule Engine is asynchronous and we need to guarantee that if a device receives '200 OK' from the Thingsboard, the message will be processed.
The queue is cleared after message processing is finished.
When a batch of messages is submitted it is saved in the queue as N separate messages.
We limit the concurrent number of messages in the Queue for single Tenant. Default max value is 100.
So when 2 batches are submitted from a single tenant with the size for each batch = 60. Message queue will reject some part of those messages and the device will receive an error.
I tried to find actors.rule.queue.max_size in thingsboard.yml, cannot find it. Please advise which parameter this has been replaced with in v3.1.1PE