Runtime: [Uri] Uri.IsWellFormedUriString() returns false for a URL which is correct

Created on 11 May 2017  Â·  19Comments  Â·  Source: dotnet/runtime

I have a C# (.Net Core 1.1) app that needs to check if a URL is valid. I used the Uri.IsWellFormedUriString() which works pretty well but have a doubt about this one below which returns false. It seems to me that the URL is perfectly valid?

Uri.IsWellFormedUriString("http://www.test.com/search/Le+Venezuela+b%C3%A9n%C3%A9ficie+d%27importantes+ressources+naturelles+%3A+p%C3%A9trole%2C+gaz%2C+mines", UriKind.Absolute)

I used the very same URL with the PHP function below which says the URL is correctly formatted:
function filter_var($url, FILTER_VALIDATE_URL)

If I refer to the RFC3986 it seems this URL is correct. Am I missing something here?

area-System.Net bug up-for-grabs

Most helpful comment

@karelz

I suspect the issue is related to combining encoded characters that require one encode value and characters that require multiple encode values. For example, å­¦ encodes to %E5%AD%A6 while [ encodes to %5B.

Here are some examples:
```c#
public class UriTests
{
[Fact] // Fails
public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithCommaAndAccentCharacter()
{
var uri = @"http://g.c/j?a=%2C%C3%A9"; //encoded characters in query: ,é

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Passes
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithComma()
    {
        var uri = @"http://g.c/j?a=%2C"; //encoded characters in query: ,

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Passes
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithAccentCharacter()
    {
        var uri = @"http://g.c/j?a=%C3%A9"; //encoded characters in query: é

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Fails
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithOpenBracketAndDoubleByteCharacter()
    {
        var uri = @"http://g.c/j?a=%E5%AD%A6%5B"; //encoded characters in query: å­¦[

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Passes
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithOpenBracket()
    {
        var uri = @"http://g.c/j?a=%5B"; //encoded characters in query: [

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Passes
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithDoubleByteCharacter()
    {
        var uri = @"http://g.c/j?a=%E5%AD%A6"; //encoded characters in query: å­¦

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

}

```

All 19 comments

Do you know what the behavior is on .NET Framework? In general, .NET Core behaves like .NET Framework.

@davidsh Indeed, I just checked and get the same behavior or .Net 4.5.2.

However this doesn't explain why this function returns false for this URL?

Thx for confirming .NET Framework behavior. This will have to be investigated to see why this is returning false.

I just found whenever a '%' appears in a url, Uri.IsWellFormedUriString() will return false. Hope this can be a starting point to investigate this issue.

This still reproduces if you shorten the URI to "http://www.test.com/%C3%A9%2C".

As far as I can tell, this happens because the URI contains both the character "é" (%C3%A9) and encoded comma (%2C), which causes the internal _flags to contain E_PathNotCanonical, but not PathIriCanonical, which in turn means false is returned here.

If you don't encode the comma (i.e. "http://www.test.com/%C3%A9," and "http://www.test.com/search/Le+Venezuela+b%C3%A9n%C3%A9ficie+d%27importantes+ressources+naturelles+%3A+p%C3%A9trole,+gaz,+mines"), then it returns true.

I have no idea if this behavior is correct.

As @svick said , I managed to overcome this issue by decoding the url.
string decodedUrl = HttpUtility.UrlEncode(url);
Uri.IsWellFormedUriString(decodedUrl, UriKind.RelativeOrAbsolute);

My best guess here is that somewhere in the code we are checking the string for encoded non-reserved characters, and that check incorrectly considers commas to be unreserved.

This should be a fairly simple issue to address for someone that wants to learn more about URI, so I'll mark this as up for grabs. If it lasts too long without getting picked up, I'll go ahead and fix it.

I'm seeing this behaviour in a .NetFramework 4.5 project also.

@hades200082 we are not tracking .NET Framework bugs in CoreFX repo.
Just to set expectations: The bar for .NET Framework fixes is high to preserve compatibility. If there are multiple customers hitting it badly and there is no reasonable workaround and the fix is low-risk (sadly, any Uri changes tend to introduce new regressions quite often), it may have a chance to get fixed in future .NET Framework. Let us know if that is the case.

In .net core 2.1 I am also encountering what looks to be the same bug, or a very similar bug.

```c#
var uri = @"https://maps.googleapis.com/maps/api/geocode/json?address=%2C%2CMontr%C3%A9al%2CQuebec%2CCanada&sensor=false";

Uri.IsWellFormedUriString(uri, UriKind.Absolute); //returns false, however above URI is valid.


However, if I leave the URI unencoded it passes the IsWellFormedUriString check:

```c#
var uri = @"https://maps.googleapis.com/maps/api/geocode/json?address=,,Montréal,Quebec,Canada&sensor=false";

Uri.IsWellFormedUriString(uri, UriKind.Absolute); //returns true

@nicholasb90 can you please create minimal repro? (as in "shortest problematic Uri possible")
If everyone does that, it will be much easier to judge what is duplicate of what ...

cc @wtgodbe

@karelz

I suspect the issue is related to combining encoded characters that require one encode value and characters that require multiple encode values. For example, å­¦ encodes to %E5%AD%A6 while [ encodes to %5B.

Here are some examples:
```c#
public class UriTests
{
[Fact] // Fails
public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithCommaAndAccentCharacter()
{
var uri = @"http://g.c/j?a=%2C%C3%A9"; //encoded characters in query: ,é

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Passes
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithComma()
    {
        var uri = @"http://g.c/j?a=%2C"; //encoded characters in query: ,

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Passes
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithAccentCharacter()
    {
        var uri = @"http://g.c/j?a=%C3%A9"; //encoded characters in query: é

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Fails
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithOpenBracketAndDoubleByteCharacter()
    {
        var uri = @"http://g.c/j?a=%E5%AD%A6%5B"; //encoded characters in query: å­¦[

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Passes
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithOpenBracket()
    {
        var uri = @"http://g.c/j?a=%5B"; //encoded characters in query: [

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

    [Fact] // Passes
    public void IsWellFormedUriString_ReturnTrue_GivenEncodedQueryStringWithDoubleByteCharacter()
    {
        var uri = @"http://g.c/j?a=%E5%AD%A6"; //encoded characters in query: å­¦

        Assert.True(Uri.IsWellFormedUriString(uri, UriKind.Absolute));
    }

}

```

Try enabling IDN and IRI-Parsing in your App.config by adding this to your configuration section to ensure correct handling for international character set:

<uri>
<idn enabled="All"/>
<iriParsing enabled="true"/>
</uri>

Afer doing this, you should create a decoded version of your URL like this to avoid complications between encoded and decoded URLs:

string decodedURL = HttpUtility.UrlDecode(yourURLString);

Now you can check like this:

if (Uri.IsWellFormedUriString(yourURLString, UriKind.Absolute) || Uri.IsWellFormedUriString(decodedURL , UriKind.Absolute))

Maybe this is not a perfect solution, but the closest one for me to get this working as reliable as possible.

Btw. I'm using .Net Framework 4.5.2, but I guess it should also work with lower versions.

I just ran into this one. You probably have plenty of examples, but just to further confirm @nicholasb90 's hypothesis:

c# Assert.True(Uri.IsWellFormedUriString("http://myhost.com/%26", UriKind.Absolute)); // pass Assert.True(Uri.IsWellFormedUriString("http://myhost.com/%C3%A9", UriKind.Absolute)); //pass Assert.True(Uri.IsWellFormedUriString("http://myhost.com/%26%C3%A9", UriKind.Absolute)); //fail

Is this a recommended work-around, i.e. using Uri.UnescapeDataString on the string before testing it? It makes my example pass but not sure if there are pitfalls.

Triage: This will be breaking change - we will have to document it at minimum.

I've had similar issues. In my case the method IsWellFormedUriString failed if it contained %2D instead of hyphen character (-)

@FaizulHussain can you please update your reply with the code (e.g. like in https://github.com/dotnet/corefx/issues/19630#issuecomment-529069574)? It will be harder to miss in future.

I just ran into this issue at work and can add that using non-ascii characters like Å or ตั together with any of the RFC 3986 section 2.2 Reserved Characters fails, ! * ' ( ) ; : @ & = + $ , / ? # [ ].
E.g. Ã…*

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matty-hall picture matty-hall  Â·  3Comments

iCodeWebApps picture iCodeWebApps  Â·  3Comments

GitAntoinee picture GitAntoinee  Â·  3Comments

aggieben picture aggieben  Â·  3Comments

yahorsi picture yahorsi  Â·  3Comments