Hi,
It validates some incorrect emails as valid, like [email protected]
Change Devise.email_regexp
from
/\A[^@\s]+@([^@\s]+\.)+[^@\W]+\z/
to
/\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i
Refer to http://stackoverflow.com/questions/22993545/ruby-email-validation-with-regex
It can validate correctly.
Yes, you can search previous issues on the topic but the regex is not meant to be a complete validation of the e-mail because it is much more complex than the regex you posted. For example, the @
could be ip4 or ip6 addresses. So we decided to do a minimal validation. You can change it if you want.
Ruby already has an email validation regex in URI::MailTo::EMAIL_REGEXP
, and this is much stricter than the default in Devise, and also stricter than the various suggestions I've seen so far posted (on this thread, here and on #4268 and #5023)
URI::MailTo::EMAIL_REGEXP
=> /\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/
From the Ruby source, this Regex is taken from the WHATWG's _suggested_ regular expression for browsers to validate an email address as part of form validation.
So we patched this issue by adding:
# Configure the email validator to be stricter
config.email_regexp = URI::MailTo::EMAIL_REGEXP
to our devise.rb
initializer.
@josevalim / @nashby / @ghost To me this feels like a rough edge on Devise, and one which every project/developer has to encounter and fix when their project grows large enough. Would you consider smoothing this rough edge, either by:
devise.rb
initialiser which is ready for other users to use out of the box, before they start persisting bad email addresses for the first time. Something like:rb
# Configure the email validator to be stricter than the current and relatively permissive default
# config.email_regexp = URI::MailTo::EMAIL_REGEXP
Let me know if you'll take either of these suggestions seriously - I can make a PR if that helps.
Thanks for bringing this up again @samjewell. Honestly I have changed this regex myself in some apps in the past because Devise's version became too simple to handle some common user typos.
I am hoping to work on a new major release at some point early this year, so that might be a good time to consider such a change in the Devise default. It'd also be possible to change this in the initialiser without causing any issue with existing apps; in other words, keeping the Devise default as is but changing how new apps are generated to the more strict version.
I'll reopen this for now to revisit when I have a chance to look into it in more detail / with more time.
Before switching to URI::MailTo::EMAIL_REGEXP
we saw quite a large number of invalid emails in production which resulted in unsuccesssful email delivery attempts. There's no way around it, users put in the wrong data sometimes. Sometimes a lot. We're talking about accidentally putting two dots instead of one in the domain name, accidentally typing = instead of ".", backticks, and somehow we had some with no-break-space (ASCII 160) characters which I think could be explained by users simply copying/pasting addresses.
URI::MailTo::EMAIL_REGEXP
caught all of those cases.
This can all be validated on the client side as well, but when there's an API involved that's not enough, and IMO there is almost no benefit to keeping an incorrect e-mail validation just for simplicity's sake when it's no simpler than just using a valid regex. Yes it would warrant a major version bump, but is that such a bad thing? Semver exists so that breaking changes are explicit and known, not so that we can avoid making changes that would be breaking altogether.
Just my two cents (and huge thanks to everyone who provided solutions, it's appreciated!)
Most helpful comment
Before switching to
URI::MailTo::EMAIL_REGEXP
we saw quite a large number of invalid emails in production which resulted in unsuccesssful email delivery attempts. There's no way around it, users put in the wrong data sometimes. Sometimes a lot. We're talking about accidentally putting two dots instead of one in the domain name, accidentally typing = instead of ".", backticks, and somehow we had some with no-break-space (ASCII 160) characters which I think could be explained by users simply copying/pasting addresses.URI::MailTo::EMAIL_REGEXP
caught all of those cases.This can all be validated on the client side as well, but when there's an API involved that's not enough, and IMO there is almost no benefit to keeping an incorrect e-mail validation just for simplicity's sake when it's no simpler than just using a valid regex. Yes it would warrant a major version bump, but is that such a bad thing? Semver exists so that breaking changes are explicit and known, not so that we can avoid making changes that would be breaking altogether.
Just my two cents (and huge thanks to everyone who provided solutions, it's appreciated!)