Caddy: Multiple slashes become one slash

Created on 16 Dec 2016 · 31Comments · Source: caddyserver/caddy

1. What version of Caddy are you running (`caddy -version`)?

Caddy (untracked dev build)
From go get github.com/mholt/caddy/caddy.
On 5:30 PM (UTC) 16 December 2016.

2. What are you trying to do?

Confirm bug exists even on master branch of Caddy after discovering it on Caddy 0.9.3.

3. What is your entire Caddyfile?

localhost {
  log
}

4. How did you run Caddy (give the full command and describe the execution environment)?

On macOS 10.12.2, create the above Caddyfile in $GOPATH/bin, run ./caddy after cd $GOPATH/bin in the terminal, and go to http://localhost:2015/path/https://www.google.com in the browser.

5. What did you expect to see?

I expect to see GET /path/https://www.google.com HTTP/1.1 in $GOPATH/bin/access.log.

6. What did you see instead (give full error messages and/or log)?

I see GET /path/https:/www.google.com HTTP/1.1 in $GOPATH/bin/access.log.

7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?

Go through the above.

discussion

Source

joelchen

👍1

Most helpful comment

Anyway, please make a pull request with the test and that function in place (just put it in the httpserver package, not a separate one). I'll be happy to do a review. :)

PR #1317 is ready for review. I have excluded benchmarking functions, but can be added if you want to keep them in the repo. I have added necessary inline comments in the code to make it easier to maintain the code in future.

ibnesayeed on 31 Dec 2016

👍4

All 31 comments

Have you tried escaping the URL ?

abiosoft on 16 Dec 2016

@abiosoft Yes, I have tried it from XMLHttpRequest. Python's SimpleHTTPServer works without escaping.

joelchen on 16 Dec 2016

This is intentional; we call path.Clean() to help prevent directory traversal attacks. It removes extra forward slashes. What did the escaped request look like exactly?

mholt on 17 Dec 2016

@mholt I have tried permutations of the following and they all result in one slash:

http://localhost:2015/path%2Fhttps%3A%2F%2Fwww.google.com
http://localhost:2015/path/https%3A%2F%2Fwww.google.com
http://localhost:2015/path/https:%2F%2Fwww.google.com
http://localhost:2015/path/https:\/\/www.google.com

I have also tried a recent version of nginx (1.11.6), and it does not have this issue. Can you disable path.Clean() by default unless turned on by some configuration in Caddyfile, or remove it and use other ways to prevent directory traversal attacks? Thank you.

joelchen on 17 Dec 2016

Can you disable path.Clean() by default

Sorry, no.

What we might be able to do is look into what exactly is causing even the query-encoded slashes to be replaced and see if we can somehow treat those as literals instead of path separators.

mholt on 17 Dec 2016

@mholt May I know your reason? Since directory traversal vulnerability has been addressed by nginx since 0.8.17 (http://nginx.org/en/security_advisories.html) without query encoding/decoding, why Caddy cannot do the same?

joelchen on 17 Dec 2016

This might be related to https://github.com/mholt/caddy/issues/643, specifically https://github.com/mholt/caddy/commit/f31875dfdeb73db9dddf6a2a175a6b0b349d9b28. Perhaps there is a better way to approach this?

mholt on 17 Dec 2016

@mholt Here is what I just found out while getting my application to work in nginx. If proxy_pass is used with URI (http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass), the equivalent of proxy + without in Caddy, same issue occurs where slashes will be merged. To remedy, merge_slashes off; has to be put into the server directive of its configuration file (http://nginx.org/en/docs/http/ngx_http_core_module.html#merge_slashes).

joelchen on 17 Dec 2016

I think that we can introduce merge_slashes option (default: true). WDYT @mholt ?

wendigo on 18 Dec 2016

Hmm, why would someone ever want the slashes to not be merged? foo//something/ looks like a mistake.

mholt on 23 Dec 2016

Sorry, no.

What we might be able to do is look into what exactly is causing even the query-encoded slashes to be replaced and see if we can somehow treat those as literals instead of path separators.

I was seriously thinking of using Caddy for all of my projects going forward, but this forced path cleanup alone will be a show stopper for me. The whole web archiving community, including big players like Internet Archive, use plain vanilla query URLs (without any URL encoding) for the sake of convenience (as the primary lookup key for web archives is URI itself). Here is an example:

http://web.archive.org/web/*/http://example.com/index.html

This means we cannot use Caddy as a reverse proxy server for many tools including:

These are just a few of many tools that serve responses based on plain URLs in their path. Not supporting it is like failing to serve an entire ecosystem. I encountered this same issue while using Go's standard net/http library for writing MemGator. I asked about some workarounds at StackOverflow.

While I understand that cleaning up /./ and /../ is important to prevent directory traversal attack, I think removing multiple successive slashes is not preventing from any security risks that I am aware of (I may be wrong here). Hence, I would really encourage and request to provide an option to turn the double slash cleanup off (it doesn't have to be on by default, but should be possible when needed). When using Caddy as a reverse proxy or loadbalancer, backend servers can take care of those attacks.

ibnesayeed on 25 Dec 2016

👍1

@ibnesayeed Hmm, interesting use case.

When someone has some time, we should find out where the slash collapsing behavior is coming from -- maybe verify that it is in path.Clean() -- and then see if we can work around that somehow.

Does nginx and others manipulate the URL at all by collapsing slashes? I'm talking about in the general case, not necessarily when proxying or anything specific (since I do know that it can collapse slashes when proxying since paths have to be combined).

mholt on 25 Dec 2016

When someone has some time, we should find out where the slash collapsing behavior is coming from -- maybe verify that it is in path.Clean() -- and then see if we can work around that somehow.

Yes, it is indeed happening in path.Clean():

Replace multiple slashes with a single slash.
Eliminate each . path name element (the current directory).
Eliminate each inner .. path name element (the parent directory)
along with the non-.. element that precedes it.
Eliminate .. elements that begin a rooted path:
that is, replace "/.." by "/" at the beginning of a path..

ibnesayeed on 25 Dec 2016

I would further note that Caddy also uses filepath.Clean method. Clean method defined in both path and filepath packages are very similar though. Only difference in the two seems to be the VolumeName. The Clean method of filepath package adds platform specific volume name such as C: on windows. I am surprised that the logic is duplicated in the standard library and they are not reusing already defined method to implement the other.

ibnesayeed on 25 Dec 2016

I have put together all the necessary bits that is used in the path cleaning on the playground to test and make changes.

ibnesayeed on 25 Dec 2016

Going forward, here are three possible approaches that I can think of:

provide a config option to disable path cleaning completely
implement a duplicate path cleaning method locally and conditionally (using a config option) ignore the slash collapsing
implement a proxy path cleaning method and whitelist certain situations by replacing occurrences of :// in the path to something like PROTOCOL_SEPARATOR/ then call the usual path.Clean() method and then replace the temporary mask back to it's original form.

The latter approach seems more reasonable to me as it is easier to implement and it does not relax the security guards in place. In our practical applications I only see those protocol prefixes having double slashes that needs to be preserved, so there is no point in whitelisting every other such occurrence. If the term PROTOCOL_SEPARATOR/ has the chance of collision then we can randomly generate a long string as mask each time we call path.Clean().

ibnesayeed on 25 Dec 2016

Merry Christmas @ibnesayeed :) Thanks for your feedback on this.

I generally try to avoid hard-coding edge cases like that into Caddy; perhaps we can write a wrapper function over path.Clean() that can replace // with some random string, do the path.Clean(), then replace the random strings again. This way, for any case where we don't want slash collapsing, we can call the wrapper function instead.

What use is there for path collapsing outside of proxying where we merge two paths?

mholt on 26 Dec 2016

Merry Christmas @ibnesayeed :)

Thanks @mholt and wishing you a very happy Xmas and new year.

I generally try to avoid hard-coding edge cases like that into Caddy; perhaps we can write a wrapper function over path.Clean() that can replace // with some random string, do the path.Clean(), then replace the random strings again. This way, for any case where we don't want slash collapsing, we can call the wrapper function instead.

This exactly what I meant by the "proxy method" in the third point. Further, in the post-note I mentioned that a random string can be used instead of a fixed one for masking double slashes.

What use is there for path collapsing outside of proxying where we merge two paths?

I have only seen legitimate double slashes being practically used in paths when a URL is part of the path and that too only near the protocol part. That's why I suggested that if we only mask the occurrences of :// then we will be sure that http://, https://, and ftp:// etc. will be preserved if they appear in the path.

The use cases I mentioned in the web archiving ecosystem is actually not the standard approach, instead it is for the sake of convenience. Most of the web archiving related services actually work based on URIs as parameters, hence, to avoid issues where client failed to encode the URI, they just started supporting non-encoded URIs in the path. For example, if you want to setup a service to provide thumbnails of any URI, you could have a service endpoint that responds to /thumbnails/{URI}. One would expect that {URI} would be url-encoded, but if there are no other arguments required then the service can relax the need of URL encoding and extracts anything that falls after /thumbnails/ as the URI to process and generate thumbnail of. This approach has some downsides, such as inability to pass query parameters, because they are now part of the supplied {URI}, so additional parameters either need to be provided by special request headers or adjusting the endpoint path scheme itself. For example if the desired dimension of the thumbnail is to be sullied by the client then the service endpoint could be something like /thumbnails/{width}/{height}/{URI}. The clear advantage of this non-standard approach is the ease that allows anyone to change manually edit desired URL in browser's address bar or curl command without the hassle of url-encoding. Good or bad, this is what is widely in practice in many such cases.

ibnesayeed on 26 Dec 2016

I wrote a rough code to illustrate how can it be done. Please find it in action with various test cases on the Go Playground.

package main

import (
    "fmt"
    "path"
    "strings"
    "time"
)

func cleanMaskedPath(p, mask string) string {
    if mask != "" && strings.Index(p, mask) >= 0 {
        t := fmt.Sprintf("%d", time.Now().UnixNano())
        p = strings.Replace(p, mask, t, -1)
        p = path.Clean(p)
        return strings.Replace(p, t, mask, -1)
    }
    return path.Clean(p)
}

func main() {
    p := "/a/b/../././/c/http://example.com/foo//bar/../blah"
    fmt.Println(cleanMaskedPath(p, "://"))
    //=> "/a/c/http://example.com/foo/blah"
}

Basically replacing all the occurrences of path.Clean(p) in the code with cleanMaskedPath(p, "://") will do the trick. If more than one masks are to be supported, the code needs to be adjusted a bit, but the idea will be the same.

ibnesayeed on 26 Dec 2016

Here is a more generic implementation with variable length masking (try on playground):

package main

import (
    "fmt"
    "math/rand"
    "path"
    "strings"
)

func randomString(n int) string {
    const letters = "abcdefghijklmnopqrstuvwxyz"
    b := make([]byte, n)
    for i := range b {
        b[i] = letters[rand.Int63()%int64(len(letters))]
    }
    return string(b)
}

func cleanMaskedPath(p string, mask ...string) string {
    t := ""
    mm := make(map[string]string)
    for _, m := range mask {
        if strings.Index(p, m) >= 0 {
            t = randomString(10)
            mm[m] = t
            p = strings.Replace(p, m, t, -1)
        }
    }
    p = path.Clean(p)
    for m, t := range mm {
        p = strings.Replace(p, t, m, -1)
    }
    return p
}

func main() {
    p := "/a/b/../././/c/http://example.com/foo//bar/../blah"

    // No masking
    fmt.Println(cleanMaskedPath(p))
    //=> "/a/c/http:/example.com/foo/blah"

    // Mask only protocol separators
    fmt.Println(cleanMaskedPath(p, "://"))
    //=> "/a/c/http://example.com/foo/blah"

    // Mask all repeated slashes
    fmt.Println(cleanMaskedPath(p, "//"))
    //=> "/a/.//c/http://example.com/blah"

    // Mask all parent directories
    fmt.Println(cleanMaskedPath(p, "/.."))
    //=> "/a/b/../c/http:/example.com/foo/bar/../blah"

    // Mask protocol separators and parent directories
    fmt.Println(cleanMaskedPath(p, "://", "/.."))
    //=> "/a/b/../c/http://example.com/foo/bar/../blah"

    // Mask everything
    fmt.Println(cleanMaskedPath(p, "//", "/..", "/."))
    //=> "/a/b/../././/c/http://example.com/foo//bar/../blah"
}

ibnesayeed on 26 Dec 2016

I am not very familiar with the Caddy code base. If someone can tell me where this should logically go then I can make a PR. Also, if it is desired to make the masking configurable, then we will need another intermediate function that calls cleanMaskedPath() with appropriate masks rather than directly adding masks everywhere in the code.

ibnesayeed on 26 Dec 2016

@ibnesayeed

This exactly what I meant by the "proxy method" in the third point. Further, in the post-note I mentioned that a random string can be used instead of a fixed one for masking double slashes.
...
That's why I suggested that if we only mask the occurrences of :// then we will be sure that http://, https://, and ftp:// etc. will be preserved if they appear in the path.

I'm suggesting that instead of masking :// in the path, we mask only // -- why limit it to just ://, if we now assume that collapsing extra path separators is an error? But I can see in your examples you do both. That's fine, but we need to decide what the default behavior will be.

Meanwhile, the sample implementations you've posted look pretty good. You've raised some really interesting questions I need more time to think about -- and of course the feedback of others is welcome here too -- such as:

1) Which masking method is faster, and is there a better/faster way? Some benchmarks would be good here.
2) Should we collapse all "//" in the path or just "://" (I'm leaning towards "//" and if it's not safe to do that by default, then it needs to be configurable, but I see no reason to single out "://" specifically.)
3) Where does this go in the code base? (I imagine it'll be in server.go around here.)

mholt on 28 Dec 2016

Which masking method is faster, and is there a better/faster way? Some benchmarks would be good here.

I don't see an opportunity of benchmarking here, because so far there is only one implementation. The first code that I wrote was nothing, but a proof of concept to illustrate what I meant (and perhaps you to meant the same thing). Later I just generalized that so that variable number of masks can be applied (zero or more). I tried to put some thoughts already to optimize the code. For example, I could have avoided if strings.Index(p, m) >= 0 {...} check and directly apply replacements for all the masks then apply unmasking for all after path.Clean() call (masks that wont match, wont be replaced and there will be no harm). However, I realized that in general cases, there will be very few requests that would need any replacement, so checking the presence would save us from generating a random string for replacement if the mask is not present in the path. Additionally, mm (mask map) is only populated with the masks that have any matches, this way I am reducing the number of loop when unmasking (if no matches found for any masks supplied, the unmasking loop will be skipped completely).

We can certainly improve the performance of random replacement string generator code (that was refactored in the latter implementation). Additionally, we can talk about the optimal size of the mask replacement string. Keeping it too short will cause collisions (both in the existing original path components as well as other mask replacements, if there were more than one masks applied at once). Keeping it too long will take longer to generate the strings. One clever way to keep the strings short while avoiding potential collisions would be to prefix the replacement string with fairly unique string that may not be very common in paths, such as two underscores __{RAND_STR}.

ibnesayeed on 28 Dec 2016

Should we collapse all "//" in the path or just "://" (I'm leaning towards "//" and if it's not safe to do that by default, then it needs to be configurable, but I see no reason to single out "://" specifically.)

My vote for default masking would go for the protocol masking (://) only. Because collapsing multiple slashes in paths is absolutely fine and it is part of the canonicalization that does not change the essence of the path. Paths //foo//bar, /foo/////bar, /foo/./bar, ///foo/././bar and //foo/.//./bar all mean the same thing, /foo/bar. However, due to the commonly misused style of passing URIs without encoding in the path, that specific occurrence of double slash after the protocol name is what needs be preserved from collapsing. I have not seen any other cases in practice (there might be some) where multiple occurrences slashes in the path are meaningful. Additionally, this very specific masking (only ://) will help preserving performance as it will cause less replacements (both masking and unmasking), hence less attempts to generate random string and less chances of collisions.

ibnesayeed on 28 Dec 2016

👍1

If it is supposed to be configurable then we can have named cases of path masking that can then be translated internally how it is actually converted to an argument.

func cleanMaskedPath(p string, mask ...string) string {
    /* Implemented before */
}

func cleanPath(p string) string {
    masks := []string
    switch config["path_cleaning_mode"] {
    case "preserve_protocol":
        masks[0] = "://"
    case "preserve_slashes":
        masks[0] = "//"
    case "preserve_dots":
        masks[0] = "/.."
        masks[1] = "/."
    case "preserve_all":
        masks[0] = "//"
        masks[1] = "/.."
        masks[2] = "/."
    case "clean_all":
        // Do nothing
    default:
        masks[0] = "://"
    }
    return cleanMaskedPath(p, masks...)
}

This is a pseudo code that illustrates how we can have an intermediate function cleanPath() that can read the config and set the masking accordingly, then calls the cleanMaskedPath() with those masks.

However, if one config takes care of all the "issues", then we don't have to introduce unnecessary configuration options just for the sake of flexibility, unless there is some compelling reason. In my opinion, currently only broken case is the one reported in the first post of this issue and that can be fixed by masking ://.

We can still keep the chain of intermediate functions, so that in future if more masking is desired, we can take care of it. If certain situations in future warrant a configuration option then we can easily implement that using this method. Otherwise introducing more configs degrade performance, inflate the code base, add the burden of maintaining documentation, answer more support questions, and add complexity for users.

For now, we can just go with something like this:

func cleanMaskedPath(p string, mask ...string) string {
    /* Implemented before */
}

func cleanPath(p string) string {
    /* No unnecessary config lookup and case checking */
    return cleanMaskedPath(p, "://")
}

And then replace all the existing calls to path.Clean() with cleanPath().

ibnesayeed on 28 Dec 2016

I didn't mean to criticize your implementation -- I think it's pretty good. As this is a critical path in Caddy I want to ensure it's as fast as possible (within reason).

One clever way to keep the strings short while avoiding potential collisions would be to prefix the replacement string with fairly unique string that may not be very common in paths, such as two underscores __{RAND_STR}.

Yes, I was going to suggest this same thing, or at least, something like "__caddy_{RAND_STR}__" as a mask.

Want to submit a pull request? It'd be good to start getting the code under review. :+1:

Paths //foo//bar, /foo/////bar, /foo/./bar, ///foo/././bar and //foo/.//./bar all mean the same thing, /foo/bar

Until it doesn't, to somebody's weird edge case, like yours with :// :wink:

I definitely agree with you for keeping it simple. Make a pull request when you have a chance and we'll get on it!

mholt on 28 Dec 2016

I didn't mean to criticize your implementation -- I think it's pretty good.

On the contrary, I think it's good to criticize implementations for the sake of improvements.

As this is a critical path in Caddy I want to ensure it's as fast as possible (within reason).

Oh, I thought you were talking about some comparative benchmarking. I will try to grab some data from some logs to see how much extra overhead it causes. Although, I think there will be many factors that will affect timings, such as how many instances in in the test set would require masking or what's the average length of paths in the test set.

Yes, I was going to suggest this same thing, or at least, something like "_caddy{RAND_STR}__" as a mask.

Perfect!

Want to submit a pull request? It'd be good to start getting the code under review. :+1:

I will try for that. However, I am still not sure why there are many places where path.Clean() is called directly, not just the server.go line you reference earlier.

Until it doesn't, to somebody's weird edge case, like yours with :// :wink:

If it ain't broke, don't fix it!

ibnesayeed on 28 Dec 2016

Where does this go in the code base? (I imagine it'll be in server.go around here.)

Following are the places where path.Clean() is called:

ibnesayeed on 28 Dec 2016

👍1

So, I created a package with test cases and benchmarks. My Go skills are limited, so I might have done things in a not very elegant way.

// pathcleaner.go

package pathcleaner

import (
    "math/rand"
    "path"
    "strings"
    "time"
)

// Most efficient method for random string generation: http://stackoverflow.com/a/31832326
const stringLenght = 10
const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
const (
    letterIdxBits = 6
    letterIdxMask = 1<<letterIdxBits - 1
    letterIdxMax  = 63 / letterIdxBits
)

var src = rand.NewSource(time.Now().UnixNano())

func generateRandomString() string {
    b := make([]byte, stringLenght)
    for i, cache, remain := stringLenght-1, src.Int63(), letterIdxMax; i >= 0; {
        if remain == 0 {
            cache, remain = src.Int63(), letterIdxMax
        }
        if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
            b[i] = letterBytes[idx]
            i--
        }
        cache >>= letterIdxBits
        remain--
    }
    return string(b)
}

func CleanMaskedPath(p string, mask ...string) string {
    t := ""
    mm := make(map[string]string)
    for _, m := range mask {
        if strings.Index(p, m) >= 0 {
            t = "/_caddy" + generateRandomString() + "__"
            mm[m] = t
            p = strings.Replace(p, m, t, -1)
        }
    }
    p = path.Clean(p)
    for m, t := range mm {
        p = strings.Replace(p, t, m, -1)
    }
    return p
}

func CleanPath(p string) string {
    return CleanMaskedPath(p, "://")
}

// pathcleaner_test.go

package pathcleaner

import (
    "path"
    "testing"
)

var paths = map[string]map[string]string{
    "/../a/b/../././/c": {
        "preserve_all": "/../a/b/../././/c",
        "preserve_protocol": "/a/c",
        "preserve_slashes": "/a//c",
        "preserve_dots": "/../a/b/../././c",
        "clean_all": "/a/c",
    },
    "/path/https://www.google.com": {
        "preserve_all": "/path/https://www.google.com",
        "preserve_protocol": "/path/https://www.google.com",
        "preserve_slashes": "/path/https://www.google.com",
        "preserve_dots": "/path/https:/www.google.com",
        "clean_all": "/path/https:/www.google.com",
    },
    "/a/b/../././/c/http://example.com/foo//bar/../blah": {
        "preserve_all": "/a/b/../././/c/http://example.com/foo//bar/../blah",
        "preserve_protocol": "/a/c/http://example.com/foo/blah",
        "preserve_slashes": "/a//c/http://example.com/foo/blah",
        "preserve_dots": "/a/b/../././c/http:/example.com/foo/bar/../blah",
        "clean_all": "/a/c/http:/example.com/foo/blah",
    },
}

func assertEqual(t *testing.T, exp, rcv string) {
    if exp != rcv {
        t.Errorf("\tExpected: %s\n\t\t\tRecieved: %s", exp, rcv)
    }
}

func maskedTestRunner(t *testing.T, variation string, mask ...string) {
    for p, v := range paths {
        assertEqual(t, v[variation], CleanMaskedPath(p, mask...))
    }
}

func TestPathClean(t *testing.T) {
    for p, v := range paths {
        assertEqual(t, v["clean_all"], path.Clean(p))
    }
}

func TestPreserveAll(t *testing.T) {
    maskedTestRunner(t, "preserve_all", "//", "/..", "/.")
}

func TestPreserveProtocol(t *testing.T) {
    maskedTestRunner(t, "preserve_protocol", "://")
}

func TestPreserveSlashes(t *testing.T) {
    maskedTestRunner(t, "preserve_slashes", "//")
}

func TestPreserveDots(t *testing.T) {
    maskedTestRunner(t, "preserve_dots", "/..", "/.")
}

func TestCleanAll(t *testing.T) {
    maskedTestRunner(t, "clean_all")
}

func TestDefaultMask(t *testing.T) {
    for p, v := range paths {
        assertEqual(t, v["preserve_protocol"], CleanPath(p))
    }
}

func maskedBenchmarkRunner(b *testing.B, mask ...string) {
    for n := 0; n < b.N; n++ {
        for p, _ := range paths {
            CleanMaskedPath(p, mask...)
        }
    }
}


func BenchmarkPathClean(b *testing.B) {
    for n := 0; n < b.N; n++ {
        for p, _ := range paths {
            path.Clean(p)
        }
    }
}

func BenchmarkPreserveAll(b *testing.B) {
    maskedBenchmarkRunner(b, "//", "/..", "/.")
}

func BenchmarkPreserveProtocol(b *testing.B) {
    maskedBenchmarkRunner(b, "://")
}

func BenchmarkPreserveSlashes(b *testing.B) {
    maskedBenchmarkRunner(b, "//")
}

func BenchmarkPreserveDots(b *testing.B) {
    maskedBenchmarkRunner(b, "/..", "/.")
}

func BenchmarkCleanAll(b *testing.B) {
    maskedBenchmarkRunner(b)
}

func BenchmarkDefaultMask(b *testing.B) {
    for n := 0; n < b.N; n++ {
        for p, _ := range paths {
            CleanPath(p)
        }
    }
}

$ go test -bench=. -v
=== RUN   TestPathClean
--- PASS: TestPathClean (0.00s)
=== RUN   TestPreserveAll
--- PASS: TestPreserveAll (0.00s)
=== RUN   TestPreserveProtocol
--- PASS: TestPreserveProtocol (0.00s)
=== RUN   TestPreserveSlashes
--- PASS: TestPreserveSlashes (0.00s)
=== RUN   TestPreserveDots
--- PASS: TestPreserveDots (0.00s)
=== RUN   TestCleanAll
--- PASS: TestCleanAll (0.00s)
=== RUN   TestDefaultMask
--- PASS: TestDefaultMask (0.00s)
PASS
BenchmarkPathClean-8         2000000           791 ns/op
BenchmarkPreserveAll-8        200000         10311 ns/op
BenchmarkPreserveProtocol-8   500000          3070 ns/op
BenchmarkPreserveSlashes-8    300000          4307 ns/op
BenchmarkPreserveDots-8       200000          6191 ns/op
BenchmarkCleanAll-8          2000000           998 ns/op
BenchmarkDefaultMask-8        500000          3132 ns/op
ok      pathcleaner 13.361s

Currently the tests and benchmarks were run on three different input paths, which in no way is the representative of a real web server hits. However, it gives the generic idea of what overhead this masking would cause. To evaluate various factors; 1) we can change the value of the stringLenght constant and see how it impacts the overhead, 2) run benchmarks on each of the three (or more) paths separately and see the effect of presence or absence of the masked substring in the path, and 3) simulate the benchmark on some real data from more than one web server logs (including those that use URIs in the path).

ibnesayeed on 29 Dec 2016

👍2

That's pretty good @ibnesayeed! Nice work, especially since you say you are "limited" in your Go skills.

Since the _vast_ majority of URLs won't have any replacements at all, I suspect there's practically no performance impact, and those that are impacted have only a minimal effect.

Would you make a pull request with using this new function in server.go? I see that path.Clean() is called in several other spots -- thanks for finding them -- I think that most of those, or at least some of them, are probably not needed now that we do it down at server.go for every request. Maybe path.Clean() is needed if the URL path is being changed by a middleware, but path.Join should take care of things and do it properly.

Anyway, please make a pull request with the test and that function in place (just put it in the httpserver package, not a separate one). I'll be happy to do a review. :)

(Edit: Also I was shown a similar issue in another repo: https://github.com/jehiah/urlnorm/issues/9)

mholt on 31 Dec 2016

Anyway, please make a pull request with the test and that function in place (just put it in the httpserver package, not a separate one). I'll be happy to do a review. :)

ibnesayeed on 31 Dec 2016

👍4

Was this page helpful?

0 / 5 - 0 ratings