Operator-sdk: New document for describing best practices for troubleshooting reconcile loops

Created on 16 Apr 2020  路  7Comments  路  Source: operator-framework/operator-sdk

Feature Request

Is your feature request related to a problem? Please describe.
When discussing https://github.com/operator-framework/operator-sdk/issues/2795 the idea came up that we should have a documentation page that provided best practices for troubleshooting reconcile loops. The idea was this document would start by providing and aggregated list of pre-req content operator creators should read such as controller runtime docs, the OpenShift Blog on operator best practices, the operator SDK getting started user guide, etc. The document would then provide helpful best practices on how to troubleshoot reconcile loops and some common mistakes to avoid. This document is not intended to provide an exhaustive list of all the mistakes that could be made but instead hopefully provide enough context and detail to help the operator developer troubleshoot their own reconcile loops and avoid well known pitfalls. Ideally this document will capture a lot of information that is typically considered tribal knowledge and help to aggregate useful references that are spread across multiple locations.

Describe the solution you'd like
A clear and concise description of what you want to happen. Add any considered drawbacks.
A new page in the documentation the meets the goals and objectives described in the above paragraph.

I am willing to take the lead on this but will need guidance from the more senior members of this project to help to capture all the useful troubleshooting techniques that the team is aware of.

help wanted kindocumentation lifecyclfrozen prioritimportant-longterm

Most helpful comment

This is a high-level docs change since it deals with "how kubernetes works". Once we have a clearer picture of our doc organization and maturity we will revisit this issue. Definitely important for v1.0.

In addition: top-level annotated bibliography (links to talks, blog posts, etc) for operator best-practices on SDK website.

All 7 comments

/milestone v1.0.0

This is a high-level docs change since it deals with "how kubernetes works". Once we have a clearer picture of our doc organization and maturity we will revisit this issue. Definitely important for v1.0.

In addition: top-level annotated bibliography (links to talks, blog posts, etc) for operator best-practices on SDK website.

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

This probably won't make it into v1.0, but is still important.

/priority important-longterm

/remove-lifecycle stale

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

/lifecycle frozen

Was this page helpful?
0 / 5 - 0 ratings