Large messages are dropped and never sent in some cases, this applies mainly to images and batched images.
The message is sent and received.
The message is not sent
Upon investigation this is due to 2 factors:
1) Slow devices
2) Slow upload speed
In the case of 1, a slow device will take a long time calculating the POW of the message, even though for images we greatly reduced it.
In both cases it results in the envelope either expiring locally, or expiring before it reaches the next hop, therefore being discarded.
A few potential solutions:
1) Reduce/remove pow (this has compatibility issues that need to be dealt with)
2) Increase ttl (but that increases pow calculation)
3) Allow expired envelopes to be propagated if they fall within the sync allowance
Another thing to investigate is how this impacts confirmations.
If a message is dropped because it is expired, is that message confirmed by the mailserver?
One of the reasons is addressed here:
https://github.com/status-im/status-react/pull/11451
There were actually some raw images being sent, though I believe the issue is still there, I can replicate with slow upload speed.
Looking at TTL.
Essentially the flow for sending messages is not really a good flow for bandwidth/cpu constrained devices.
How it works is:
1) The message TTL/POW is calculated.
2) The message is added to the pool of messages to be sent
3) At interval of 300 ms messages are broadcasted to each peer concurrently, batched together
This is problematic for a few reasons:
I have changed the code so that:
1) Messages are not uploaded concurrently to all peers
2) POW calculation is not done if messages are currently being uploaded (we want to really run the POW just in time)
That seems to have helped ( I am now able to send 3 images from my home network to my mobile device), though is not the best solution as it's not a solid solution, as it would require more code changes.
The way I would normally handle this is:
1) Have a priority queue with messages ordered by Author->TTL
2) Don't calculate TTL until you are ready to send to the first peer
3) Don't send to multiple peers concurrently
This only makes sense for "light" nodes or nodes that are running from a home network etc, cluster nodes don't have this issues as 1) They don't calculate pow 2) They have good upload speeds.
Most helpful comment
Looking at TTL.
Essentially the flow for sending messages is not really a good flow for bandwidth/cpu constrained devices.
How it works is:
1) The message TTL/POW is calculated.
2) The message is added to the pool of messages to be sent
3) At interval of 300 ms messages are broadcasted to each peer concurrently, batched together
This is problematic for a few reasons:
I have changed the code so that:
1) Messages are not uploaded concurrently to all peers
2) POW calculation is not done if messages are currently being uploaded (we want to really run the POW just in time)
That seems to have helped ( I am now able to send 3 images from my home network to my mobile device), though is not the best solution as it's not a solid solution, as it would require more code changes.
The way I would normally handle this is:
1) Have a priority queue with messages ordered by Author->TTL
2) Don't calculate TTL until you are ready to send to the first peer
3) Don't send to multiple peers concurrently
This only makes sense for "light" nodes or nodes that are running from a home network etc, cluster nodes don't have this issues as 1) They don't calculate pow 2) They have good upload speeds.