Status-react: Restored account for e2e doesn't receive messages from new users

Created on 3 Oct 2019 · 21Comments · Source: status-im/status-react

Bug Report

Problem

basic_user in e2e stops receive messages from unknown users (who don't add him as contact)

Expected behavior

the message is received

Actual behavior

the message is not received

Notes

Was introduced in nightly 01/10/2019
Test case is here
Potentially this issue can be reproducible for other accounts, so needs to be addressed from my POV.
We use this account every day (restore it often) for testing purposes.

Acceptance Criteria

basic_useris able to receive messages from new users again.

Reproduction

Create account on device 1
Restore basic_user on device 2 (it is public, seed phrase tree weekend ceiling awkward universe pyramid glimpse raven pair lounge grant grief )
Send message from device 1 to basic_user (didn't add user to contact list)

Additional Information

Status version: nightly 01/10/2019
Operating System: Android, iOS

Logs

Ful adb: no_message.log
Status.log
Video + logs:

bug e2e test blocker high-priority

Source

churik

All 21 comments

cc @rachelhamlin

churik on 3 Oct 2019

@churik A few questions:

Does it always fail? Or sometimes it works some other it doesn't?
Can you replicate outside e2e? (I have tried and it works fine on my side)
Do you have geth.log for those?

Best guess is that because we keep only max 3 devices synchronized, during these tests it was probably sending to other devices, as it's recovered often, but only geth.log will tell.

cammellos on 3 Oct 2019

@cammellos

I have tried 3 times with this particular account and can reproduce outside e2e
I'll try to get geth.log (sometimes it is not created on my LG V20) and get back to this
also it wasn't reproducible before 01/10/2019 - have never seen this failure.

churik on 3 Oct 2019

also it wasn't reproducible before 01/10/2019 - have never seen this failure.

Yes, that's interesting, by the way I am receiving your messages, I have one recovered as well :)

cammellos on 3 Oct 2019

@cammellos reproduced again.
Geth.log for device_1 (basic_user, Android 8):
geth (3).log
Geth.log for device_2(sender, IOS 11.4.1):
geth.log

churik on 3 Oct 2019

Yes, seems like it has not targeted the correct devices:
failed to handle Encryption message: device not found
I can have a look to see if there's any particular change that might have made this more likely to happen (if tests are run many times or in parallel though, it's bound to happen), but overall it would be better to use always a different account, that way tests could also be run in parallel (although that's a bit tricky, but if you'd like I can have a look).

cammellos on 3 Oct 2019

if you don't find obvious reason, I can change this particular test (didn't notice problems with others)
So don't hesitate to ping me

churik on 3 Oct 2019

Now it is used in 2 tests for chatting, one of them test_block_user_from_public_chat is critical - you can see this failure here

churik on 3 Oct 2019

Is this something we can automate to make sure no regressions happen in the future?

andremedeiros on 3 Oct 2019

hm, it is automated.
It is an issue that I could find thanks to e2e testing. It is not reproducible with other accounts.

churik on 3 Oct 2019

It is an issue that I could find thanks to e2e testing.

Perfect! Thank you for getting on this.

andremedeiros on 3 Oct 2019

this is due to a known limitation (as far as I can tell, further investigation is needed, but I am fairly positive), if you restore an account on multiple devices, only 3 will be kept in sync, in order of last activity (there needs to be a limit, otherwise it can be maliciously exploited by an attacker).

In case of users that are used by e2e tests, it is recovered multiple times a day and it's likely to result in some of the devices being left out, which means that the message won't be received (eventually the list of the devices converges).

Not sure we will be addressing the issue at the protocol level, as until we can rely on a decentralized form of storage (ipfs/swarm), there's bound to be a period of adjustment when the same account is recovered many times at the same day, this is generally not an issue with normal users as recovering an account is a fairly rare occurrence (does not happen daily).

For specifically fixing the e2e tests, a solution would be to generate on each run a different keypair, rather than hardcoding it in the tests, it's a bit more complicated, but will effectively solve this issue, and allow us to run tests in parrallel, with that respect.

cammellos on 3 Oct 2019

@cammellos I'll fix the tests in this case, if it is not common issue and couldn't be reproduced normally.

churik on 3 Oct 2019

👍1

@cammellos
now I can see this issue on restored account, that I don't use for e2e.
So yes, I restore it a lot ~1 time per day - but it is not so often as in e2e.
And that bother me because I really can't say when it could start for other users.
And also I used this account many times before, so obviously that smth happened in nightly 01/10/2019.

churik on 4 Oct 2019

1 time per day should not be enough to cause issues, I'll investigate, could you send me the geth.logs in the meantime of the user who did not receive the message?

cammellos on 4 Oct 2019

@cammellos sorry can't reproduce it again now - attached whole log of affected device.
Issue was somewhere between 14:40 - 15:10 (GMT+2)

geth.log

churik on 4 Oct 2019

relevant and related to #9857 (basically it is the same as I can understand)
Does it make sense to keep #9857 only @cammellos ?

churik on 26 Feb 2020

looks like we don't have this issue anymore, feel free to reopen if necessary.

cammellos on 10 Nov 2020

well we still have it, sometimes peers just can't be discovered by other peers (correct me if I'm wrong here) if the account was restored a lot of times.
So that means that sometimes restored multiaccount doesn't get 1-1 messages, invites to group chats.
I still face with that on my test account and that's why for e2e we are always using fresh accounts.
But I'm not sure anyway is it worth to look on it and even it is possible to fix or not

churik on 10 Nov 2020

yes, that's expected, it will always take some time for the algorithm to converge, that's due to the fact that we don't have decentralized storage, so I'd say it's a wont-fix for now, most of the users will not experience this issue I take, what do you think?

cammellos on 10 Nov 2020

👍1

agree

churik on 10 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings