Angular.js: URL validation does not recognize URLs with no protocol

Created on 10 Mar 2014 · 19Comments · Source: angular/angular.js

http://w is a valid url, but www.google.com is not. This seems to be a bug.

Working fiddle here: http://jsfiddle.net/HB7LU/2390/

A better regex that recognizes both types of urls can be found here: http://stackoverflow.com/questions/833469/regular-expression-for-url

PRs plz! forms moderate confusing bug

Source

jplaut

👍1

Most helpful comment

I don't think that localhost is a valid URL. localhost with a port number, which is how localhost is used 99% of the time (localhost:3000, for example) would be valid. I also think that google.com and www.google.com should be valid. Here's a regex that matches google.com, www.google.com, http://google.com, localhost:3000, but not just a word like localhost.

^(http|https|ftp)?(://)?(www|ftp)?.?[a-z0-9-]+(.|:)([a-z0-9-]+)+([/?].*)?$

jplaut on 12 Mar 2014

👎8 👍3 😄2

All 19 comments

In discussion with @IgorMinar:

Does it make sense that the url is valid without a protocol? For instance, currently http://google.com is a valid url, while google.com is not. Similarly, http://localhost is a valid url, while localhost is not a valid url.

I believe that if localhost were a valid URL, then it would follow suit that foo would be a valid URL and the URL validation would be kind of useless.

I think http://w is a valid url as it is in the same form as http://localhost.

auser on 12 Mar 2014

^(http|https|ftp)?(://)?(www|ftp)?.?[a-z0-9-]+(.|:)([a-z0-9-]+)+([/?].*)?$

jplaut on 12 Mar 2014

👎8 👍3 😄2

I'd prefer TLD's to be required, because in the real world, all websites have a TLD, except for localhost and IP addresses. So maybe adopt the Django URL validator RegExp, which does just that, and is already battle-tested?

blaise-io on 17 Mar 2014

No matter what you do, you're always going to find people who have a problem with it. This is why I prefer to go with the bare bones validation provided by the RFC/recommended by the W3C, they can be extended if necessary using pattern validators, but otherwise work very well.

caitp on 17 Mar 2014

There is currently no easy way of extending the patterns provided by AngularJS.
(if you know a way, please answer this question :)

blaise-io on 17 Mar 2014

There certainly is, you can use the ng-pattern attribute to extend validation to include your custom requirements (I don't do stackoverflow, though)

caitp on 17 Mar 2014

I think having the user use ng-pattern, even if it _feels_ less than ideal. I think we're the safest if we stick by the RFC.

auser on 17 Mar 2014

@caitp I know that solution, and it's currently the "Angularest" way of solving the problem, but it's not ideal because it means I have to pollute every of my URL inputs with a big ng-pattern. And ng-pattern is additional, it does not replace the default URL validation, so that would result in hacks or big workarounds if I wanted to disabled the default Angular URL validation AND keep my HTML semantic.

@auser I understand Angular wants to stick with the RFC, but it would be grrrreat if it could be extended or configured in an easier way than it is now.

blaise-io on 17 Mar 2014

@blaise-io What are you thinking that would look like?

@caitp +1

auser on 18 Mar 2014

@auser Ideally this is 1) configurable per module, with the defaults being as they are now, and 2) configurable per input directive.

I think a sensible approach is to allow configuration to replace the defaults in *InputType so that I could replace (for example) urlInputType with my own function. Something like

someInjectedObject.setInputType('url', function(scope, element, attr, ctrl, $sniffer, $browser) {
    // My custom input type handling
});

someInjectedObject could be the current formDirectiveFactory, converted to, or wrapped in a provider, configurable in myApp.config() like other configurations. (Alternatively, NgModelController could inject from a configurable provider). This would implement configuration per module (1).

someInjectedObject could also be configured per url directive, using another directive on the input field that requires ngModel, on which setInputType could be called. This would implement configuring a single input directive (2).

blaise-io on 18 Mar 2014

why not just create your custom validator and use it as <input type="text" my-url>?

that way you get all the flexibility you need without any verbosity. I think we should stick to RFC because of danger of false positives.

as we make modularization core part of angular (in v2) there will be no difference in effort between using ngUrl vs myUrl.

It's important to realize that angular can't solve everyone's issues by default, but it must be extensible so that all use-cases can be covered if needed.

IgorMinar on 18 Mar 2014

👍1

@IgorMinar Custom directives are not ideal and promote bad semantics.

my-url, as opposed to type=url, does not benefit from browsers implementing useful features on top of HTML5 input types. For example, iPhone tries to capitalize the first character in type=text, but not in type=url, and when using type=email, some browsers suggest values from your address book and won't suggest entries from your history that don't match the input type.

But AngularJS already hijacked these types as directives, without offering a way to extend or replace that default behavior.

blaise-io on 19 Mar 2014

That's not really true, Angular lets you decorate directives, so you could decorate it to provide a custom handler for the url or email types.

caitp on 19 Mar 2014

Thanks caitp, I'll look into that.

blaise-io on 20 Mar 2014

This should be closed as "won't fix".

SekibOmazic on 28 Mar 2014

The URL validation appears to still be broken.

It thinks that http:google.com is valid without the //.

Try it on the demo it the bottom of the documentation page.
https://docs.angularjs.org/api/ng/input/input%5Burl%5D

(Running AngularJS v1.4.7 btw)

kosso on 24 Nov 2015

The relevant URL_REGEXP has been recently updated (see ffb6b2f).

Although surely not perfect, it tries to mimic the way browsers (escpecially Chromium) do things.
See #11381 for more context.

That said, I couldn't find anything related to the number of slashes in the spec (with a quick look), but if it is appropriate for Chromium and Mozilla, I guess it is more than appropriate for our needs :smiley:

_{If you can provide some source of info according to which http:google.com is not a valid URL, I'd be happy to look into it.}

gkalpak on 24 Nov 2015

Hi, I got url validation error with 'www.xyz.com' and google led me there. I roughly checked the lengthy discussion and cannot get a good answer, but rather the arrogant words "This should be closed as \"won't fix\", wtf---is this even a technical question? How many of us do you think will type "https://www.google.com" instead of "www.google.com" for browsing? Did you ever heard some one say 'visit us at HTTPS//xxxx' instead of 'visit us at www.xxx' on tv or radio? frastraiting...

zipper01 on 31 Mar 2019

It is an old discussion, with lots of opinions but lacking is rampant RFC-referencing, so here you go:

https://tools.ietf.org/html/rfc3986#section-3

@gkalpak http:google.com is not a URI because the authority component must be prefixed by a //, so http://google.com is a valid URI.

The scheme component of a URI is required, so www.example.com is a valid domain name, but not a URI.

URI-references are either URIs or relative-refs. E.g. you can use relative-refs in the context of a browser where default schemes and authorities are available (i.e. the URI the page was loaded from).

https://tools.ietf.org/html/rfc3986#section-4

E.g. //foo.example.com/bar, it is a network relative reference, the browser will default to the protocol used to load the page. Used for site that support both HTTP and HTTPS (hopefully trending to none at this point :-).

E.g. `/bar' is an absolute path relative-ref, the browser will default to the protocol and authority used to load the page.

However www.example.com and localhost and localhost:3000 are not URI nor a relative-ref, they lacks the prefix/suffix required to identify whether it is a scheme (:), authority (//), path (/), or fragment (#). You can't assume it is an authority. That would be a heuristic behavior you need to build in.

The RFC does allow for this sort of heuristic resolution in the context of human interfaces (like a dialogue box) with 'URI suffix references` which are basically ambiguous relative-refs from which you can infer real relative-refs, and from there real URIs. This is to cover the real-world syntax used by humans.

https://tools.ietf.org/html/rfc3986#section-4.5

URI suffix references would include www.example.com, www.example.com/bar,localhostandlocalhost:3000`.

So I think the discussion is some people who what isUriSuffixReference() semantics (aka isHumanUri()) and those who think it should be be implementing isUri() semantics (aka isUri()). But they are two different things.