Not all of our conditions follow our condition conventions.
We also don't propagate conditions in a consistent (or even correct) way: https://github.com/knative/serving/issues/4937
I'm opening this to track some work around wrangling our conditions.
In general, everything should look something like this:

Here's (a bit simplified) Service as an example. Service is pretty close to correct, but it's not unconditionally bumping ObservedGeneration or updating its status with reconciliation errors (bolded things aren't correct (yet)):

We'll also need to rework the Revision and KPA and SKS conditions to fit better into this world, since they're all over the place.
ObservedGeneration/area API
Thanks for the writeup @jonjohnsonjr. That makes a lot of sense to me!
@JRBANCEL pointed out some more inconsistencies in our lifecycle Mark* methods:
$ rg -e 'func .* Mark[^\(]+' -o --no-filename | sort | uniq
func (cs *CertificateStatus) MarkNotReady
func (cs *CertificateStatus) MarkReady
func (cs *CertificateStatus) MarkResourceNotOwned
func (cs *CertificateStatus) MarkUnknown
func (cs *ConfigurationStatus) MarkLatestCreatedFailed
func (cs *ConfigurationStatus) MarkLatestReadyDeleted
func (cs *ConfigurationStatus) MarkResourceNotConvertible
func (cs *ConfigurationStatus) MarkRevisionCreationFailed
func (is *IngressStatus) MarkLoadBalancerPending
func (is *IngressStatus) MarkLoadBalancerReady
func (is *IngressStatus) MarkNetworkConfigured
func (is *IngressStatus) MarkResourceNotOwned
func (pas *PodAutoscalerStatus) MarkActivating
func (pas *PodAutoscalerStatus) MarkActive
func (pas *PodAutoscalerStatus) MarkInactive
func (pas *PodAutoscalerStatus) MarkResourceFailedCreation
func (pas *PodAutoscalerStatus) MarkResourceNotOwned
func (rs *RevisionStatus) MarkActivating
func (rs *RevisionStatus) MarkActive
func (rs *RevisionStatus) MarkContainerExiting
func (rs *RevisionStatus) MarkContainerHealthy
func (rs *RevisionStatus) MarkContainerMissing
func (rs *RevisionStatus) MarkDeploying
func (rs *RevisionStatus) MarkInactive
func (rs *RevisionStatus) MarkProgressDeadlineExceeded
func (rs *RevisionStatus) MarkResourceNotConvertible
func (rs *RevisionStatus) MarkResourceNotOwned
func (rs *RevisionStatus) MarkResourcesAvailable
func (rs *RevisionStatus) MarkResourcesUnavailable
func (rs *RouteStatus) MarkCertificateNotOwned
func (rs *RouteStatus) MarkCertificateNotReady
func (rs *RouteStatus) MarkCertificateProvisionFailed
func (rs *RouteStatus) MarkCertificateReady
func (rs *RouteStatus) MarkConfigurationFailed
func (rs *RouteStatus) MarkConfigurationNotReady
func (rs *RouteStatus) MarkIngressNotConfigured
func (rs *RouteStatus) MarkMissingTrafficTarget
func (rs *RouteStatus) MarkResourceNotConvertible
func (rs *RouteStatus) MarkRevisionFailed
func (rs *RouteStatus) MarkRevisionNotReady
func (rs *RouteStatus) MarkServiceNotOwned
func (rs *RouteStatus) MarkTrafficAssigned
func (rs *RouteStatus) MarkUnknownTrafficError
func (ss *ServiceStatus) MarkConfigurationNotOwned
func (ss *ServiceStatus) MarkConfigurationNotReconciled
func (ss *ServiceStatus) MarkResourceNotConvertible
func (ss *ServiceStatus) MarkRevisionNameTaken
func (ss *ServiceStatus) MarkRouteNotOwned
func (ss *ServiceStatus) MarkRouteNotReconciled
func (ss *ServiceStatus) MarkRouteNotYetReady
func (sss *ServerlessServiceStatus) MarkActivatorEndpointsPopulated
func (sss *ServerlessServiceStatus) MarkActivatorEndpointsRemoved
func (sss *ServerlessServiceStatus) MarkEndpointsNotOwned
func (sss *ServerlessServiceStatus) MarkEndpointsNotReady
func (sss *ServerlessServiceStatus) MarkEndpointsReady
The naming is inconsistent: NotReady means Unknown, other times NotReady means failed. It's not clear from the method name what state you're actually setting.
For simple methods that just set the state I'd propose following this pattern, since it's most prevalent:
MarkFooReady -> Ready=True
MarkFooNotReady -> Ready=Unknown
MarkFooFailed -> Ready=False
There are a lot of convenience methods to make setting certain statuses more legible, we should probably leave those alone for now.
MarkFooReady -> Ready=True
MarkFooNotReady -> Ready=Unknown
MarkFooFailed -> Ready=False
We also need MarkFooNotOwned for the case where Foo resource is not owned by the parent resource, and MarkFooNotReconciled for the case where generation != observedGeneration. We probably want to add these two functions into our standard.
@dgerd opened an issue and a proposal (#5780) on how to deal with condition types and marking resource status. PR #5804 already implements that proposal for revision lifecycle and I plan to continue the refactoring for other resource status markers as well.
also this https://github.com/knative/serving/pull/5804#issuecomment-542805906 for cross referencing.
As part of #4937 I took a look at our current reconciler loops and put together a few state machines to help visualize the potential condition sets you can end up with. A few thoughts about them:
controllers should report the most recent generation seen in status when they post status. From our API specification we state: The latest metadata.generation that the reconciler has observed. If聽observedGeneration聽is updated,聽conditions聽MUST be updated with current status in the same transaction. If we make all reconciler exits post status this would allow us to just bump observedGeneration from the controller framework at the beginning of the reconcile loop. The current state of the world is:I haven put together a Revision reconciler state machine yet, but here are the Service, Route, and Config ones.



Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.
Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/lifecycle stale
Stale issues rot after 30 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.
Rotten issues close after an additional 30 days of inactivity.
If this issue is safe to close now please do so by adding the comment /close.
Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/lifecycle rotten
Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.
Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/close
@knative-housekeeping-robot: Closing this issue.
In response to this:
Rotten issues close after 30 days of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh by adding the comment/remove-lifecycle rotten.Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-lifecycle rotten
/reopen
/lifecycle-frozen
@vagababov: Reopened this issue.
In response to this:
/remove-lifecycle rotten
/reopen
/lifecycle-frozen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/cc @dprotaso
related doc: https://docs.google.com/document/d/1tDPP43xSMIFdnw0D8msAXY9P6fEG5x9CG3kRhjpJh-4/edit#heading=h.dxlthq7ppqhh
/cc @whaught
Weston, is this covered by the stuff you've been doing?
/cc @whaught
Weston, is this covered by the stuff you've been doing?
I've dealt with some of this - ObservedGeneration is now automatically moved (even on failure) and we reset to unknown on new generations. There's probably more refactoring to do to ensure consistent naming and we do have some oddnesses around non-terminal conditions (ideally InitializeConditions would init everything, but conditionset only contains terminal conditions so we omit the rest).
Markus makes a good point about the way the 'Active' condition interacts between Revision and PA resources too.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
/remove-lifecycle stale
Most helpful comment
As part of #4937 I took a look at our current reconciler loops and put together a few state machines to help visualize the potential condition sets you can end up with. A few thoughts about them:
controllers should report the most recent generation seen in status when they post status. From our API specification we state:The latest metadata.generation that the reconciler has observed. If聽observedGeneration聽is updated,聽conditions聽MUST be updated with current status in the same transaction.If we make all reconciler exits post status this would allow us to just bump observedGeneration from the controller framework at the beginning of the reconcile loop. The current state of the world is:I haven put together a Revision reconciler state machine yet, but here are the Service, Route, and Config ones.
KService Reconciler
Route Reconciler
Configuration Reconciler