GitHub is currently experiencing an outage. Kops uses raw.githubusercontent.com to fetch the channels files.
During this outage, kops fails with:
kops update cluster
Using cluster from kubectl context: <my-cluster>
I0402 13:25:47.824966 5970 context.go:249] hit maximum retries 5 with error unexpected response code "500 Internal Server Error" for "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": 500: Internal Server Error
error reading channel "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": unexpected response code "500 Internal Server Error" for "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": 500: Internal Server Error
We should have kops handle this more gracefully, continuing on unless thats not possible. For example, update cluster should work and just not report any channel updates, but create cluster may not since it might rely on the AMI or kubernetes version info.
I would go a step further and say that Kops should not have a hard dependency on GitHub at all. We run Kops as part of an automated service and it's alarming to me that our production environment is regularly pulling down data from GitHub.
Could the channel content be hosted in the state bucket and/or the Kops deployment bucket? (By deployment bucket I mean the bucket where Kops pulls nodeup/protokube from. I'm not sure what the Kops team calls that bucket internally.)
Ran into this in prod environment. Github seemed terribly slow kept failing on my browser too, and kops update cluster command kept failing to get the channels file. Is there a way to override the url at this moment before we have a better fix at this?
This days Github has quite few issues and this causes small "outages" due to the fact that one can't upgrade/provision clusters. Anything we can help with to get this issue going?
Will add this on the discussion list for the next office hours and see what would be the best thing to do. :)
We could store latest downloaded channel information in S3 state bucket. That could give a "last known" state of channel information to actions like update cluster
This is a really bad issue and a sad issue and I have it:
kops create cluster \
--name=...
--state=s3://$KOPS_BUCKET \
....
--zones=...
--node-count=$NODE_COUNT \
--node-size=$NODE_SIZE \
--master-size=$NODE_MASTER_SIZE
I0423 16:08:00.851481 3141 context.go:231] hit maximum retries 5 with error unexpected response code "500 Internal Server Error" for "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": 500: Internal Server Error
error reading channel "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": unexpected response code "500 Internal Server Error" for "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": 500: Internal Server Error
I cannot deploy my environments now... I suppose I am not the only one stuck now.
In fact, I do not understand why kops need those channels' contents? Joining the idea of @geekofalltrades, Is it something that can be packaged in the build release it-self and not having dependencies at all?
UPDATE: If hard dependencies is needed (Github or S3), it would be nice to provide several fallbacks to Kopsto avoid the single point of failure
In theory, some if not all that that could be skipped if someone manually sets the:
If those are set manually, I really see no reason for the channel check to be mandatory, maybe even could be completely skipped.
To give some background, the channels files serve a few purposes:
Some of this information is only relevant during certain kops commands such as create cluster, update cluster, upgrade cluster. And in those cases sometimes the information is required and sometimes it is only a "nice to have".
The intent is for this information to be decoupled from kops binaries and releases. We can update the channels file without needing to release a new kops version. Hopefully this explains why it cant live in the cluster's state store.
A while back Kops started hosting nodeup binaries in multiple locations for redundancy, you can see that in your userdata for example. I would propose that we do the same with channels. Kops could look for channels files in multiple locations. We'll have to build out the CI tooling to update the channels file in every location anytime it is changed in the master branch but I think that is reasonable.
I've added this to tomorrow's office hours agenda so hopefully we can find a reasonable way forward.
@rifelpet, Thanks for the explanation and good idea.
Just to follow your proposition, I would run kops in that order:
This would ensure that kops run whatever happens to channel file!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Most helpful comment
I would go a step further and say that Kops should not have a hard dependency on GitHub at all. We run Kops as part of an automated service and it's alarming to me that our production environment is regularly pulling down data from GitHub.
Could the channel content be hosted in the state bucket and/or the Kops deployment bucket? (By deployment bucket I mean the bucket where Kops pulls nodeup/protokube from. I'm not sure what the Kops team calls that bucket internally.)