Status-react: Large messages are dropped

Created on 19 Nov 2020  路  2Comments  路  Source: status-im/status-react

Bug Report

Problem

Large messages are dropped and never sent in some cases, this applies mainly to images and batched images.

Expected behavior

The message is sent and received.

Actual behavior

The message is not sent

Notes

Upon investigation this is due to 2 factors:

1) Slow devices
2) Slow upload speed

In the case of 1, a slow device will take a long time calculating the POW of the message, even though for images we greatly reduced it.

In both cases it results in the envelope either expiring locally, or expiring before it reaches the next hop, therefore being discarded.

A few potential solutions:

1) Reduce/remove pow (this has compatibility issues that need to be dealt with)
2) Increase ttl (but that increases pow calculation)
3) Allow expired envelopes to be propagated if they fall within the sync allowance

Another thing to investigate is how this impacts confirmations.
If a message is dropped because it is expired, is that message confirmed by the mailserver?

bug

Most helpful comment

Looking at TTL.
Essentially the flow for sending messages is not really a good flow for bandwidth/cpu constrained devices.

How it works is:

1) The message TTL/POW is calculated.
2) The message is added to the pool of messages to be sent
3) At interval of 300 ms messages are broadcasted to each peer concurrently, batched together

This is problematic for a few reasons:

  • Between 1 & 3 a lot of time be passed, especially if the PoW calculation was expensive, during that time the TTL clock is ticking
  • 3 is problematic because if you are on slow upload networks (home networks for example), and you try to upload messages concurrently, you end up with none of the messages making it in time to the other end.
  • Another issue is that messages generated on the device are mixed with messages received by the device. A priority queue that prioritizes messages based on the author should help.

I have changed the code so that:
1) Messages are not uploaded concurrently to all peers
2) POW calculation is not done if messages are currently being uploaded (we want to really run the POW just in time)

That seems to have helped ( I am now able to send 3 images from my home network to my mobile device), though is not the best solution as it's not a solid solution, as it would require more code changes.

The way I would normally handle this is:

1) Have a priority queue with messages ordered by Author->TTL
2) Don't calculate TTL until you are ready to send to the first peer
3) Don't send to multiple peers concurrently

This only makes sense for "light" nodes or nodes that are running from a home network etc, cluster nodes don't have this issues as 1) They don't calculate pow 2) They have good upload speeds.

All 2 comments

One of the reasons is addressed here:
https://github.com/status-im/status-react/pull/11451

There were actually some raw images being sent, though I believe the issue is still there, I can replicate with slow upload speed.

Looking at TTL.
Essentially the flow for sending messages is not really a good flow for bandwidth/cpu constrained devices.

How it works is:

1) The message TTL/POW is calculated.
2) The message is added to the pool of messages to be sent
3) At interval of 300 ms messages are broadcasted to each peer concurrently, batched together

This is problematic for a few reasons:

  • Between 1 & 3 a lot of time be passed, especially if the PoW calculation was expensive, during that time the TTL clock is ticking
  • 3 is problematic because if you are on slow upload networks (home networks for example), and you try to upload messages concurrently, you end up with none of the messages making it in time to the other end.
  • Another issue is that messages generated on the device are mixed with messages received by the device. A priority queue that prioritizes messages based on the author should help.

I have changed the code so that:
1) Messages are not uploaded concurrently to all peers
2) POW calculation is not done if messages are currently being uploaded (we want to really run the POW just in time)

That seems to have helped ( I am now able to send 3 images from my home network to my mobile device), though is not the best solution as it's not a solid solution, as it would require more code changes.

The way I would normally handle this is:

1) Have a priority queue with messages ordered by Author->TTL
2) Don't calculate TTL until you are ready to send to the first peer
3) Don't send to multiple peers concurrently

This only makes sense for "light" nodes or nodes that are running from a home network etc, cluster nodes don't have this issues as 1) They don't calculate pow 2) They have good upload speeds.

Was this page helpful?
0 / 5 - 0 ratings