@marwan-at-work I'm moving the discussion from the bottom of the closed issue https://github.com/gomods/athens/issues/263 here.
I guess the same is also valid for the S3 implementation.
I think I'm on the 'removal' team. If we have mocks to test a different implementation, what are we really testing?
I would like to keep them. At least for unit tests of other components (e.g fetcher?).
with E2e i would run against actual instance.
like we should not write unit test to verify GCP/S3/Cosmos works, we believe they do, we should test whether or not our queries and API is correct
The fetcher (and other components using storage) could use the in-memory implementation for tests but I think there is still some logic in the storage implementations which could be unit-tested. We could definitely add gcp to the common storage test suite though.
@michalpristas unit testing the fetcher is done separately no? Plus, if we want e2e tests, we can just spin up the mem or fs storage.Backend implementation for e2e as @marpio suggested.
What trips me up is the fact that we're mocking GCP. It's like writing two separate implementations, and saying the second implementation proves the first one is working 😄 (which is obviously not true)
Unit testing specific GCP things (not in common storage_tests) is totally fine (even though we're currently not doing that right now). But we don't have to mock anything for it, we can just run those tests against the real GCP because we really have to anyway.
For me, it is a wording. What I understand by unit tests is a set of tests testing component by itself which can be run everywhere not requiring an internet connection... When I decide to run unit tests on a train on my way home I should be able to, so I can check I did not cause any obvious regression.
So if we remove mocks, let's not call these test Unit tests but let's make it a part of E2E. This means we should run E2E for each supported storage.
Not sure if this makes sense I writing again from a beach.
@michalpristas I agree on the unit test part. So let's expand on that: say you'd like to unit-test getting a module from GCP, if you mock "getting a module from GCP", you've basically completely rewrote that function for the test. So you're no longer unit testing it "getting a module from GCP", you're just testing "getting a module from a mock".
And sorry if I didn't make sense but you definitely do not have to run E2e just to test one storage. You can totally test just one storage as long as you have that storage up and running in your dev environment. Which you will, if that's what you were working on. And so you can totally unit test it. We can just have if noCredentials {t.Skip()} for those who want to run tests locally without having to spin up dependencies.
@marwan-at-work your point is 100% valid, but not describing my scenario in mind. For me, mock helps with following.
v1.0, v2.0.My ambition is not testing the storage implementation per se but testing our wrapper and logic on top of that. Ensuring that contract agreed between storages is kept intact with my changes.
If you want mocks away let's get rid of them, these are just use cases I find mocks useful at.
@michalpristas Your points makes sense to me if the wrapper did need to do any of the above. Can you give an example of where in the GCP storage wrapper is there logic that should be tested with a mock bucket? Because right now we're testing the mock as if it's the real thing which is the whole point of this discussion.
I'm fine with keeping them or getting rid of them. I think the important part of this discussion issue is to agree on the overall architecture of the storage packages and how we organize/test them in the future.
So your use case is about testing the wrappers. Here are some of the GCP wrappers:
https://github.com/gomods/athens/blob/master/pkg/storage/gcp/deleter.go
https://github.com/gomods/athens/blob/master/pkg/storage/gcp/saver.go
https://github.com/gomods/athens/blob/master/pkg/storage/gcp/lister.go
You can see that those wrappers are doing nothing except delegate a call to .bucket which is under bucket_cloud.go (or bucket_mock.go) So there's nothing worth testing or mocking because they're not doing any logic that should be tested. Even if they do, i.e. the extractVersions function , they can be tested without a mock, or they should be tested with the real implementation.
Here’s why they should be tested with the real implementation, because having a mock inside a non-mock implementation can dangerously lead to false positives:
newBucketMock() — then the tests are fired. g.store.Delete()s.bucket.Exists() to check if the bucket does exist. bucket field, we’re gonna go to bucket_mock.go:L94 to check if a module exists or not. As for your four points above:
The wrapper that should really be concerned with that is the download.Protocol that takes a storage and does business logic based on what the storage gave it (i.e. if not found, go to upstream and do cache fill) -- we should 100% test this scenario on the download.Protocol level by passing it a mock backend.Storage and not and mock GCP Bucket.
If you look at the current wrapper (*gcp.Storage), there seems to be no special response from the underlying storage (as shown by the files above). So all that the wrapper is doing is just delegating the call.
None of the wrappers above have any specific logic for timeouts. To test a timeout, test the "real wrapper" which is the download.Protocol. The download.Protocol is the one in charge of dealing with timeouts because it knows what to do: for example if the storage times out, maybe screw the storage and just go get the package from upstream and don't block the user. We can pass a mock storage to download.Protocol and make it timeout. NOT a gcp storage with a mock bucket. But a mock storage :)
PS. Let me be my own devil's advocate and give you scenarios where mocks make sense:
A. We should test, for example, that if storage.Save fails, that the underlying implementation needs to retry 3 times.
In this case, we still don't need to mock a gcp bucket. We can just create a wrapper such as this:
func WithRetry(b storage.Backend) storage.Backend {
return &withRetry{b}
}
This way, we can actually test WithRetry by giving it a mock storage and not a mock gcp bucket
that can and should be tested for all storages and so it handled on the generic side.
This can also be tested on the generic storage.Backend interface for all implementations, and not a specific implementation
@michalpristas So sorry for the lengthy response, but since this is a "discussion" type of issue, thought I'd dig in deeper with some code examples and scenarios. Let me know if any of the above is not clear as I am happy to keep discussing and do more thought exchange :) 💯 🤓
@marwan-at-work @michalpristas I was just looking at the tests and thought I'll revive this issue so we can close it eventually :-)
I think you're right on this one @marwan-at-work . Most of the time we just call the mock functions directly. There isn't much logic here (and if it's there it is encapsulated in here:
https://github.com/gomods/athens/blob/83281a80fb529fdf87cf6a790dffbe8c9a5b466d/pkg/storage/module/upload.go#L21:6
and this can be tested on it's own).
The only thing, I think is nice with mock is that we can check if we Close() everything properly:
https://github.com/gomods/athens/blob/83281a80fb529fdf87cf6a790dffbe8c9a5b466d/pkg/storage/gcp/gcp_test.go#L52-L55
I guess this wouldn't be as easy without a mock. Any ideas how to replace this?
Same idea here in my opinion -- testing a closed mock, is not really testing a closed gcp bucket.
Most helpful comment
@marwan-at-work your point is 100% valid, but not describing my scenario in mind. For me, mock helps with following.
v1.0,v2.0.My ambition is not testing the storage implementation per se but testing our wrapper and logic on top of that. Ensuring that contract agreed between storages is kept intact with my changes.
If you want mocks away let's get rid of them, these are just use cases I find mocks useful at.