When deploying ark as described in the documentation (and also the role bindings) this error comes up in the logs of the ark server.
Even after making the service account used by the server and deleting / adding the server back the error seems to persist.
I worked with @PCatinean and we were able to get this working. We aren't sure what the problem was, but once we following the documentation steps exactly, it started working.
This particular error is quite confusing, however, as it's not entirely clear what is wrong (it's clear that a permission is missing, but there's no mention of why or how to fix).
What's going on in this particular case is that during startup, Ark tries to query GCE to make sure that the what you specified here in the Ark config is a valid GCE project and that you have permission to access it:
persistentVolumeProvider:
name: gcp:
config:
project: your-project-name
We added this logic a while ago to try to make Ark "fail fast" in the event of a misconfiguration. I still think this is worthwhile, although if it's tripping up too many people we could discuss if it's worth changing the logic here or removing it.
@skriss I do think we should call errors.Wrap and include a message that we couldn't retrieve the specified project, and suggest that the user confirm that the project is correct in the config, the GCE service account has compute.storageAdmin, and that the cloud-credentials secret is correct.
My recommendation would be to fail gently. compute.projects.get is not essential for normal functionality of Ark. We can log a warning that we were unable to get the list of projects to verify the project exists, but continue on our merry way.
SGTM
I can take a stab at this! 😄
Honestly I would probably just remove the code that attempts to get the project, since if we're just going to warn and continue if it fails, there's not much value in having it. Any later block-store operations would error if the project does not exist. We've also modified this code since the issue was reported to get the project ID from the credentials file rather than having it be user-specified, which further diminishes the value of this check. @rosskukulinski @ncdc you OK with that?
If we go ahead with removing the code, then we should also remove the compute.projects.get permission from the IAM role we recommend at https://github.com/heptio/ark/blob/master/docs/gcp-config.md
Sgtm
On Sun, Sep 30, 2018 at 3:34 PM Steve Kriss notifications@github.com
wrote:
Honestly I would probably just remove the code that attempts to get the
project, since if we're just going to warn and continue if it fails,
there's not much value in having it. Any later block-store operations would
error if the project does not exist. @rosskukulinski
https://github.com/rosskukulinski @ncdc https://github.com/ncdc you
OK with that?If we go ahead with removing the code, then we should also remove the
compute.projects.get permission from the IAM role we recommend at
https://github.com/heptio/ark/blob/master/docs/gcp-config.md—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/heptio/ark/issues/329#issuecomment-425745503, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAABYoxP7E39pGFz3cSUIuCI_H_qbVCHks5ugRzAgaJpZM4SSUoB
.
Cool with me, thanks for checking @skriss!
Missing permission is resourcemanager.projects.get.
@skriss, I believe this is the logic that needs to be removed:
https://github.com/heptio/ark/blob/master/pkg/cloudprovider/gcp/block_store.go#L67-L71