Devise: Devise.email_regexp is not correct

Created on 14 Nov 2015  路  4Comments  路  Source: heartcombo/devise

Hi,

It validates some incorrect emails as valid, like [email protected]

Change Devise.email_regexp
from

/\A[^@\s]+@([^@\s]+\.)+[^@\W]+\z/

to

/\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

Refer to http://stackoverflow.com/questions/22993545/ruby-email-validation-with-regex

It can validate correctly.

Most helpful comment

Before switching to URI::MailTo::EMAIL_REGEXP we saw quite a large number of invalid emails in production which resulted in unsuccesssful email delivery attempts. There's no way around it, users put in the wrong data sometimes. Sometimes a lot. We're talking about accidentally putting two dots instead of one in the domain name, accidentally typing = instead of ".", backticks, and somehow we had some with no-break-space (ASCII 160) characters which I think could be explained by users simply copying/pasting addresses.

URI::MailTo::EMAIL_REGEXP caught all of those cases.

This can all be validated on the client side as well, but when there's an API involved that's not enough, and IMO there is almost no benefit to keeping an incorrect e-mail validation just for simplicity's sake when it's no simpler than just using a valid regex. Yes it would warrant a major version bump, but is that such a bad thing? Semver exists so that breaking changes are explicit and known, not so that we can avoid making changes that would be breaking altogether.

Just my two cents (and huge thanks to everyone who provided solutions, it's appreciated!)

All 4 comments

Yes, you can search previous issues on the topic but the regex is not meant to be a complete validation of the e-mail because it is much more complex than the regex you posted. For example, the @ could be ip4 or ip6 addresses. So we decided to do a minimal validation. You can change it if you want.

Ruby already has an email validation regex in URI::MailTo::EMAIL_REGEXP, and this is much stricter than the default in Devise, and also stricter than the various suggestions I've seen so far posted (on this thread, here and on #4268 and #5023)

URI::MailTo::EMAIL_REGEXP
=> /\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/

From the Ruby source, this Regex is taken from the WHATWG's _suggested_ regular expression for browsers to validate an email address as part of form validation.

So we patched this issue by adding:

  # Configure the email validator to be stricter
  config.email_regexp = URI::MailTo::EMAIL_REGEXP

to our devise.rb initializer.

@josevalim / @nashby / @ghost To me this feels like a rough edge on Devise, and one which every project/developer has to encounter and fix when their project grows large enough. Would you consider smoothing this rough edge, either by:

  • Making the default validation itself stricter? (I saw somewhere a comment that this change is not backwards compatible, but could we make a major-version change for this)?
  • Adding an additional comment to the devise.rb initialiser which is ready for other users to use out of the box, before they start persisting bad email addresses for the first time. Something like:
    rb # Configure the email validator to be stricter than the current and relatively permissive default # config.email_regexp = URI::MailTo::EMAIL_REGEXP

Let me know if you'll take either of these suggestions seriously - I can make a PR if that helps.

Thanks for bringing this up again @samjewell. Honestly I have changed this regex myself in some apps in the past because Devise's version became too simple to handle some common user typos.

I am hoping to work on a new major release at some point early this year, so that might be a good time to consider such a change in the Devise default. It'd also be possible to change this in the initialiser without causing any issue with existing apps; in other words, keeping the Devise default as is but changing how new apps are generated to the more strict version.

I'll reopen this for now to revisit when I have a chance to look into it in more detail / with more time.

Before switching to URI::MailTo::EMAIL_REGEXP we saw quite a large number of invalid emails in production which resulted in unsuccesssful email delivery attempts. There's no way around it, users put in the wrong data sometimes. Sometimes a lot. We're talking about accidentally putting two dots instead of one in the domain name, accidentally typing = instead of ".", backticks, and somehow we had some with no-break-space (ASCII 160) characters which I think could be explained by users simply copying/pasting addresses.

URI::MailTo::EMAIL_REGEXP caught all of those cases.

This can all be validated on the client side as well, but when there's an API involved that's not enough, and IMO there is almost no benefit to keeping an incorrect e-mail validation just for simplicity's sake when it's no simpler than just using a valid regex. Yes it would warrant a major version bump, but is that such a bad thing? Semver exists so that breaking changes are explicit and known, not so that we can avoid making changes that would be breaking altogether.

Just my two cents (and huge thanks to everyone who provided solutions, it's appreciated!)

Was this page helpful?
0 / 5 - 0 ratings