basic_user in e2e stops receive messages from unknown users (who don't add him as contact)
the message is received
the message is not received
Was introduced in nightly 01/10/2019
Test case is here
Potentially this issue can be reproducible for other accounts, so needs to be addressed from my POV.
We use this account every day (restore it often) for testing purposes.
basic_useris able to receive messages from new users again.
basic_user on device 2 (it is public, seed phrase tree weekend ceiling awkward universe pyramid glimpse raven pair lounge grant grief )basic_user (didn't add user to contact list)Ful adb: no_message.log
Status.log
Video + logs:
cc @rachelhamlin
@churik A few questions:
Does it always fail? Or sometimes it works some other it doesn't?
Can you replicate outside e2e? (I have tried and it works fine on my side)
Do you have geth.log for those?
Best guess is that because we keep only max 3 devices synchronized, during these tests it was probably sending to other devices, as it's recovered often, but only geth.log will tell.
@cammellos
also it wasn't reproducible before 01/10/2019 - have never seen this failure.
Yes, that's interesting, by the way I am receiving your messages, I have one recovered as well :)
@cammellos reproduced again.
Geth.log for device_1 (basic_user, Android 8):
geth (3).log
Geth.log for device_2(sender, IOS 11.4.1):
geth.log
Yes, seems like it has not targeted the correct devices:
failed to handle Encryption message: device not found
I can have a look to see if there's any particular change that might have made this more likely to happen (if tests are run many times or in parallel though, it's bound to happen), but overall it would be better to use always a different account, that way tests could also be run in parallel (although that's a bit tricky, but if you'd like I can have a look).
if you don't find obvious reason, I can change this particular test (didn't notice problems with others)
So don't hesitate to ping me
Now it is used in 2 tests for chatting, one of them test_block_user_from_public_chat is critical - you can see this failure here
Is this something we can automate to make sure no regressions happen in the future?
hm, it is automated.
It is an issue that I could find thanks to e2e testing. It is not reproducible with other accounts.
It is an issue that I could find thanks to e2e testing.
Perfect! Thank you for getting on this.
this is due to a known limitation (as far as I can tell, further investigation is needed, but I am fairly positive), if you restore an account on multiple devices, only 3 will be kept in sync, in order of last activity (there needs to be a limit, otherwise it can be maliciously exploited by an attacker).
In case of users that are used by e2e tests, it is recovered multiple times a day and it's likely to result in some of the devices being left out, which means that the message won't be received (eventually the list of the devices converges).
Not sure we will be addressing the issue at the protocol level, as until we can rely on a decentralized form of storage (ipfs/swarm), there's bound to be a period of adjustment when the same account is recovered many times at the same day, this is generally not an issue with normal users as recovering an account is a fairly rare occurrence (does not happen daily).
For specifically fixing the e2e tests, a solution would be to generate on each run a different keypair, rather than hardcoding it in the tests, it's a bit more complicated, but will effectively solve this issue, and allow us to run tests in parrallel, with that respect.
@cammellos I'll fix the tests in this case, if it is not common issue and couldn't be reproduced normally.
@cammellos
now I can see this issue on restored account, that I don't use for e2e.
So yes, I restore it a lot ~1 time per day - but it is not so often as in e2e.
And that bother me because I really can't say when it could start for other users.
And also I used this account many times before, so obviously that smth happened in nightly 01/10/2019.
1 time per day should not be enough to cause issues, I'll investigate, could you send me the geth.logs in the meantime of the user who did not receive the message?
@cammellos sorry can't reproduce it again now - attached whole log of affected device.
Issue was somewhere between 14:40 - 15:10 (GMT+2)
relevant and related to #9857 (basically it is the same as I can understand)
Does it make sense to keep #9857 only @cammellos ?
looks like we don't have this issue anymore, feel free to reopen if necessary.
well we still have it, sometimes peers just can't be discovered by other peers (correct me if I'm wrong here) if the account was restored a lot of times.
So that means that sometimes restored multiaccount doesn't get 1-1 messages, invites to group chats.
I still face with that on my test account and that's why for e2e we are always using fresh accounts.
But I'm not sure anyway is it worth to look on it and even it is possible to fix or not
yes, that's expected, it will always take some time for the algorithm to converge, that's due to the fact that we don't have decentralized storage, so I'd say it's a wont-fix for now, most of the users will not experience this issue I take, what do you think?
agree