I got several users in the past day with the same patern:
They go to the platform (recurrent user or new one) and they see this:

The red button does not work.
A Ctrl+R fixes the display but the red button still does not work for them. This is the console of a user on the latest Chrome:

I'm putting s2 even if there is a workaround because I'm afraid we are loosing new customers without knowing it (not everyone contact us)
??
I unblocked user so far in 2 ways:
bug-s2: a non-critical feature is broken, no workaround
UK have also had a report of a user not being able to sign up. Clicking the red button did nothing.
The user reported using the most up to date version of Firefox (they were tech saavy enough to check their version).
Definitely an s2.
Ok, so we're checking one issue related to browser extensions injecting javascript in a suspicious way for two French users, but we have similar reports from UK and AUS as well.
I can't replicate this in any browsers myself.
This could be caused at any time by our third-party javascript dependencies failing to load for any reason, namely the Google maps and Stripe javascript we load from external sources. If there's an issue there (poor network connection on the user side, some availability issue with connecting to those servers, etc) then the above detailed issue will definitely happen, the rest of our javascript will break. I have seen this once or twice in dev environment when my internet connection was cutting out.
did you notice the first js error is a stripe error Matt? Content-Security-Policy
It doesnt look like a network speed issue, does it?
I see some pages where people add a CSP meta to the html to allow for this cases (unsafe-inline).
I wonder why is this happening only for some people? could be browser and its version?
on another line, one issue coming out of this could be "make the website work even if maps and stripe URLs are totally down". should be an easy one and increase reliability of the webapp I think.
Just adding comment here that we had reports of this in Aus too. I also think I experienced the problem of 'not working red button' on the edit cart page, I swear I had to click the continue button 3-4 times before it worked . . but it was late and I dismissed it
on another line, one issue coming out of this could be "make the website work even if maps and stripe URLs are totally down"
Nice idea!
on another line, one issue coming out of this could be "make the website work even if maps and stripe URLs are totally down"
Yeah this would be really great. Been on my mind for a while!
Given the state of the internet is this worth discussing as a preemptive priority?
If it's not too much work, that would be awesome. It's a really brittle weakpoint.
Shall we make a spike issue?
Content-Security-Policy... I see some pages where people add a CSP meta to the html to allow for this cases (unsafe-inline).
I did investigate that a bit. With Content-Security-Policy in relation to blocking scripts, the server itself has to pass the browser an optional whitelist, and we're not doing that (you can check the headers). We probably should add that whitelisting at some point, but I think it will require removing all inline scripts from every page on the site, and a _lot_ of careful testing as various things will break.
It would help to have more data on specific browser versions from these reports. If a new browser has started enforcing some brutal new script-blocking rules we may be in trouble, but I haven't found anything yet to indicate that that's the case...
I suspect its more like to be antivirus software/plugin doing that
Mine are all on latest Chrome version. I'm still awaiting feedback with removing some extensions.
What troubles me is that is happens also for customer who orders regularly, like every week. Suddenly they have the problem...
Another report on Chrome latest version. The shopper was successful with IE.

awaiting info
A new bit of information from a longstanding Hub that has multiple customers experiencing this inability to login (can't change password
He reckons most of them are people 'coming out of the woodwork' as in they logged in a long time ago and are just reappearing now that they're worried about starving - and have decided a local food system is a good idea after all.
This made me wonder - could it be some 'not quite correct' thing in the data migration when we introduced email confirmation? I can't check this because the system sees them as confirmed so I can't resend a confirmation request. Just a pondering from his observation combined with mine
@kirstenalarsen Based on the fact that people can do this on one browser and experience the problem on another browser I think it is unlikely in any way related to email confirmations.
Did you find out which browsers did and did not work?
Seems to be like a problem with javascript failing to load.... which I am wondering if it is related to insane internet load at the moment..
I wonder if what you are experiencing @kirstenalarsen isn't another issue, because he is mentionning a blanck page. In my cases the login button just does not work. You click on it and nothing happens.
Also, it is the same for the create new enterprise button: in these cases it is for sure not an old account :(
here is newest info
He is using Windows 10 and the latest Chrome. Also used Edge.
He originally tried to log in with his original user name and password [email protected] Computer said no. He then reset the password and put in a new password with the same user name. Maybe it sort of let him log in but this seems to be where he got into the loop thing in the shop (kept asking him to log in, could not check out even as a guest)..
So he tried to log in again tonight with his email and the new password. Computer said no. Then he logged in with his user name and old password - computer said yes.
Okay, I've got some potentially relevant data based on the "old users that haven't logged in for a long time" angle:
irb(main):006:0> Spree::User.where(spree_api_key: nil).count
(7.7ms) SELECT COUNT(*) FROM "spree_users" WHERE "spree_users"."spree_api_key" IS NULL
=> 3916
irb(main):007:0> Spree::User.where('spree_users.last_sign_in_at > ? AND spree_users.spree_api_key = ?', 3.years.ago, nil).count
(0.8ms) SELECT COUNT(*) FROM "spree_users" WHERE (spree_users.last_sign_in_at > '2017-03-29 11:43:13.906959' AND spree_users.spree_api_key = NULL)
=> 0
Recent users have API keys set by default, really old users don't...
what does that mean @Matt-Yorkley - could it be something easy to fix? Do I need to make this S1?
I'm making this S1. People being unable to login is not acceptable :(
on home page we have header:
"content-security-policy | frame-ancestors 'none'"
on shops page we have
"content-security-policy | frame-ancestors 'self'"
Because our CSP headers are slightly different in different pages (homepage vs shop page) it could be relevant in what page the user is when they try to login.
I am not sure yet what exactly this HTTP header is doing but there's a cross here saying IE doesnt implement this security header (it ignores it):
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/frame-ancestors
and Lynne said: "The shopper was successful with IE."
The theory would be that something goes wrong with the CSP settings that break stripe javascript code and that breaks OFN's javascipt code.
I'll investigate this tomorrow.
I'll be back on Tuesday, but I think the next steps are:
I've unassigned myself for now, so anyone can pick this up.
@Matt-Yorkley There is a little bug in your second query:
Spree::User.where('spree_users.last_sign_in_at > ? AND spree_users.spree_api_key = ?', 3.years.ago, nil).count
This well generate where spree_api_key = null which is always false.
Spree::User.where('spree_users.last_sign_in_at > ? AND spree_users.spree_api_key is null', 3.years.ago).count
=> 1619
Spree::User.where('spree_users.last_sign_in_at > ? AND spree_users.spree_api_key is not null', 3.years.ago).count
=> 2717
Spree::User.where('last_sign_in_at is not null and spree_users.spree_api_key is null').order(:last_sign_in_at).last.last_sign_in_at
=> Tue, 06 Aug 2019 18:37:01 AEST +10:00
Spree::User.where('spree_users.spree_api_key is null').order(:created_at).last.created_at
=> Thu, 27 Feb 2020 22:22:59 AEDT +11:00
I would conclude from this that newly created users don't have an api key but that all users who logged in after August 2019 all have an api key.
I looked into the case @kirstenalarsen mentioned. I'm not sure if the user knew what they were doing. It's a very old account. Created in 2015 and a password reset link was also sent in 2015 but not used. The last old login was in 2017 and we introduced email validation in 2018.
Then they logged in twice yesterday but no password reset had been triggered. I don't know how a user can believe to reset the password but it actually didn't happen. Maybe they have another email address and got confused?
All this is probably very different to the original reported problem of Stripe Javascript crashing our page. But if the page crashed and buttons didn't work then that would explain some confusion.
@luisramos0 Some facts for our investigation:
content-security-policy: frame-ancestors 'none'.script-src 'self' which is not in our code.
content-security-policy: default-src 'self';
connect-src 'self' https://api.stripe.com https://errors.stripe.com;
script-src 'self';
style-src 'self' 'unsafe-inline'; frame-src 'self';
img-src 'self' https://q.stripe.com; font-src data: https:;
media-src 'none'; object-src 'self';
I conclude that something is violating Stripe's security policy. Matt's suggestion, it could be a plugin, sounds reasonable. An unrelated example of this kind of issue: https://github.com/reduxjs/redux-devtools/issues/380
It's even possible that these people have a malicious plugin installed and Stripe's security policy is just doing its thing.
I think we should set up a Bugsnag JS project. It's every easy. Create the project in Bugsnag and include the key like this in the layout:
<% if Rails.env.staging? || Rails.env.production? %>
<script
src="//d2wy8f7a9ursnm.cloudfront.net/bugsnag-2.min.js"
data-apikey="4b8......09d7b79ffd">
</script>
<% end %>
Rachel's screenshot mentions that Stripe is blocking a script from ruzozi.locixugoro.com. Searching for locixugoro brings up French sites about suspected scam. Visiting locixugoro.com prompts to install a browser plugin without any explanation. At least in this one case we can say that a browser plugin is trying to do something that looks very suspicious.
Working on making our site work even when Stripe fails sounds like a good idea in general. But it would be even better to detect that Stripe fails and show some kind of notification which prompts to switch browsers or uninstall plugins.
In any case, @RachL can you follow up with these people and make sure that they are safe? There are probably less secure payment gateways out there and we don't want this plugin to steal credit card details in other online shops.
Re. the password reset not being triggered would be consistent with what he said - he tried to change the password but it didn't work, and that's why his old password was still there - pushing the button didn't do anything. Same with the person who doesn't have a user - he couldn't sign up.
So the bugsnag thing you're suggesting would tell us when this happens so we can see how often it is? in which case YES PLEASE
I am not agreeing user error because some of the people reporting this are not tech-deficient. Tess is familar with OFN and other platforms and she totally watched someone getting completely stuck
The other reason I'm pretty sure about this is because remember after the last deploy @mkllnk who I hit a panic button saying we had bug in the deploy - was because I couldn't login. I had a similar experience to what people are describing and managed to get through it and convinced myself it was my mistake. But now I think perhaps it wasn't
The malicious plugin theory is interesting and hopeful - would also explain why they're ok in other browsers. Could we get some kind of warning that pops up when it happens?
Bugsnag js sounds good :+1:
I just realised I forgot to add you to the private Slack channel we had for this.
The first screenshot we had suggested malicious browser extensions, but I don't think that can explain all the cases.
Ok so for one of my users, deactivating the extension Fenetre Mailto helped her to be able to use Chrome again in order to login.
It is a plugin you can find on the chrome store, I have no idea what kind of validation the chrome store has: https://chrome.google.com/webstore/detail/fenetre-mailto/gepijnnkhnilemhhacebnhcndgogkamc?hl=en
It's possible the domain we found might be hosting a malicious version of that plugin, designed to look like it. I'd recommend avoiding it either way...
interesting Maikel:
"The home page and the shop pages send content-security-policy: frame-ancestors 'none'. Only embedded shops alter the content-security-policy when embedded."
Loading https://openfoodfrance.org/microbrasserie-de-la-roche-aigue/shop
I see "content-security-policy: frame-ancestors 'self'"
Anyway, I think the theory that a browser plugin can break stripe's or our own CSP is valid.
We should be able to let the user (or us through logs or bugsnag) know this is broken.
"add-ons and extensions in browsers cause CSP violations" and it looks like tools like report URI will let you know when/how/whom.
I dont think we should go for a tool like this now but we can certainly try to add some logging.
If we agree we dont want to do js logging to our server but instead directly to bugsnag (I am not sure but you guys seem to agree :+1:), the task is to detect these cases and make sure a bugsnag alert is generated.
For this we need to replicate this problem in some way, maybe try to install a browser plugin that violates stripes CSP and make sure the alert is sent to bugsnag.
@Matt-Yorkley @luisramos0 There is another possibility of failures I've been thinking about. Rachel and Kirsten mentioned that it worked after a reload of the page and then they forgot about it. Could it be that some pages or scripts are cached when they shouldn't?
Rails has a nice way of putting a checksum in the asset file names and referencing that. So I'm not sure how this is possible but I remember that Luis mentioned something about new translations not being included at some point. Did that get solved? Different problem? Can you imagine a way that a browser would cache the home page which then references an old and non-existing js-file?
@luisramos0 I'm really surprised about the header you are seeing. I did this:
curl -v https://openfoodfrance.org/microbrasserie-de-la-roche-aigue/shop > /dev/null
...
< content-security-policy: frame-ancestors 'none'
< cache-control: no-cache, no-store, max-age=0, must-revalidate
< pragma: no-cache
< expires: Fri, 01 Jan 1990 00:00:00 GMT
...
And the code looks like self is only set when EmbeddedPageService#embed! is called.
https://github.com/openfoodfoundation/openfoodnetwork/blob/940423acfc50600dd07e769533c7f5f0de252c07/app/services/embedded_page_service.rb#L51-L54
```
Luis' idea to use https://report-uri.com/ looks good. The free plan allows 10,000 reports. The busiest sites have less than 50,000 page views. So if less than 20% of customers have bad plugins, we could capture them all. We should create an account for each instance though to not reach the quota. Maybe let's start with one instance and see how many reports come through.
Another thought: fetching Javascript from external domains could be failing due to DNS resolution problems (which we know have been happening this week) as per this issue: https://github.com/openfoodfoundation/openfoodnetwork/issues/5111
The translations problem was #4328 and it was due to assets not being compiled if JS was not changed in that release, only translations. Assets clean fixed the problem: https://github.com/openfoodfoundation/ofn-install/pull/538/files
we have strong evidence that #5121 will fix a major JS error (when google maps JS code fails to load): some of the very first alerts on bugsnagJS had "google not defined".
I'd close this issue with #5121 and reopen if we have new reports after #5121 is live.
Okay, #5121 fixes the problem where all our Angular breaks if the google maps js fails to load.
It doesn't address the root cause though, which is: google maps js (and possibly stripe js) are regularly failing to load. We haven't got to the bottom of why that is happening, and we will still have related issues it if it continues (although they'll be much less severe after #5121).
Do we close this now or put prod test label and test after next deploy?
There's nothing we can verify after the release so not a prod-test.
I think we have fixed the S1 because the login issues will not happen again. We will continue having problems in maps if google maps fails.
I'd say we should close this for now and re-open (or even create separate issue as this one is already very long) if there are new reports after the next release.
If anyone wants to proceed differently with this issue please go ahead :+1:
@luisramos0 @Matt-Yorkley I have a new report of this issue by a customer using Chrome an Explorer. Bother are his browsers that he is using regularly. When using Firefox (which he had to install to test) everything works fine.
I will ask for extensions. Anything else I should look for? Can we found something in the logs now or we don't have more info for this type of errors?
it's a long story here, what do you mean by "this issue"? Do they see "label_login"? What url are they using? Did they login before on that browser? when was last successful login? Can you try and get the content of their browser console like in the description of this isssue?
@luisramos0
what do you mean by "this issue"?
the original issue we are commenting on here. In detail:
Do they see "label_login"?
yes. that's the screenshot I've got:

I'm waiting to see if a cache clear solved the translation problem.
What url are they using?
https://openfoodfrance.org/au-local/shop#/login
Did they login before on that browser? when was last successful login?
Yes Chrome is their default browser. They ordered in June with that browser, no problem. When they came back in August, they saw the problem.
Can you try and get the content of their browser console like in the description of this issue?
I gave out an explanation on how to do it, but I doubt I will have feedbacks... it's a customer that does not know how to use video conference tools, so I cannot do a screenshare with him to go step by step on how to do it.
Another one, Firefox this time... still trying to fetch info... super hard :(

Most helpful comment
I'm making this S1. People being unable to login is not acceptable :(