In #1166 and #1167, sigs.k8s.io/yaml replaced gopkg.in/yaml.v2. Motivation:
This was motivated by trying to keep all dependencies within sig/k8s owned projects.
sigs.k8s.io/yaml is a fork of github.com/ghodss/yaml, not of gopkg.in/yaml.v2, so I would like to revisit this change and decide if we should go back to gopkg.in/yaml.v2 or stay with the current sigs.k8s.io/yaml.
The main change between github.com/ghodss/yaml (and therefore sigs.k8s.io/yaml) and gopkg.in/yaml.v2 is that marshalling and unmarshalling is done in two steps, using the JSON representation as an intermediate. I haven't tested but doing it in two steps probably incurs in some performance reduction. I don't think this is a big issue though as we are talking about configuration of an scaffolding project, a non-critical part of a non-ciritical process.
Why do it in two steps? Avoid code duplication if you want to support both YAML and JSON at the same time. As Sam Ghods (owner) explains in his blog, supporting both YAML and JSON requires to duplicate the code in MarshalJSON and MarshalYAML, UnmashalJSON and UnmarshalYAML, are the tags json:"..." as yaml:"...". This again doesn't apply, as we do not support both at the same time, we only support YAML, so we actually have to use the MarshalJSON and UnmarshalJSON methods and the json tags despite storing it in YAML with yaml.Marshal and yaml.Unmarshal[Strict].
It specifically has a paragraph near the end where he mentions our use case: configuration files.
Note that there are a few other uses for YAML beyond converting them to and from Go structs. gosexy/yaml, for example, uses YAML to store and access configuration files. In those cases, this may not be the right approach, and you may want the full power of YAML.
So the thing is, our usecase doesn't have the issue solved by github.com/ghodss/yaml where both JSON and YAML need to be handled so the only advantage of using the kubernetes-sigs owned library I see is that is a kubernetes-sigs-owned fork. The drawbacks would be the mentioned performance, that any feature that lands in gopkg.in/yaml (I think they are close to a v3) will be delayed until sig.k8s.io/yaml updates its dependencies, and the reduced clarity as json tags and MarshalJSON/UnmarshalJSON methods need to be used instead of their YAML equivalents. Thoughts?
@camilamacedo86 @estroz @DirectXMan12 @christopherhein @mengqiy
@Adirio I'm not hard pressed either way, if using gopkg.in/yaml.v2 directly is going to provide some benefits we can revert that. Maybe it would be useful to file an issue on sigs.k8s.io/yaml to discuss this and understand more about why we choose to use ghodss/yaml as a base vs gopkg.in/yaml.v2 since sigs.k8s.io/yaml is used throughout kubernetes/kubernetes in the apiserver, scheduler, kubeadm, etc.
Maybe it would be useful to file an issue on
sigs.k8s.io/yamlto discuss this and understand more about why we choose to use ghodss/yaml as a base vs gopkg.in/yaml.v2 since sigs.k8s.io/yaml is used throughout kubernetes/kubernetes in the apiserver, scheduler, kubeadm, etc.
Kubernetes does have JSON representations of the files, we only have YAML representations. I think thats the main difference.
Its is a subject that shows that will fit better discuss in the controller/kubebuilder meeting.
Just as a datapoint, they don't produce the same output. yaml.Marshal with sigs.k8s.io produces
...
managedFields:
- apiVersion: foo.example.com/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
.: {}
...
go.pkg.in/yaml.v2 produces:
...
managedfields:
- manager: kubectl-client-side-apply
operation: Update
apiversion: ...
time: "2021-01-13T09:51:19-05:00"
fieldstype: FieldsV1
fieldsv1:
raw:
- 123
- 34
- 102
...
I think it unlikely that we'll _never_ deal with k8s types, so my preference would be to stick with sigs.k8s.io libraries.
What are you parsing in your example? We use yaml to persist configuration files. Are you storing it in the plugins section or is it just a random example?
@justinsb
Ah - sorry I forgot to specify @Adirio . It's an instance of the go type for a CRD (generated by kubebuilder actually!), and - digging in - the difference seems to come from this field in the metadata: https://github.com/kubernetes/apimachinery/blob/1d8c923392f087e29f1e5bb563e073b833adc454/pkg/apis/meta/v1/types.go#L1185
In this case, I think it's because FieldsV1 implements json.Marshaler: https://github.com/kubernetes/apimachinery/blob/10b38829b6211c0d672e14a126479221312d3368/pkg/apis/meta/v1/helpers.go#L277
Edit: But also note the fieldstype vs fieldsType - I assume that's because of yaml vs json tags (?)
MarshalJSON needs to be converted to MarshalYAML and json tags to yaml tags.
That should fix those 2 things.
Discussed in today's meeting. As we are not 100% sure that the configuration information will never get parsed to JSON (for example to pass it to out-of-tree plugins in phase 2) the performance increase that swapping this to gopkg.in/yaml doesn't feel enough to justify it. If any other reason to change it raises in a future we may want to revisit this decission.
/close
@Adirio: Closing this issue.
In response to this:
Discussed in today's meeting. As we are not 100% sure that the configuration information will never get parsed to JSON (for example to pass it to out-of-tree plugins in phase 2) the performance increase that swapping this to
gopkg.in/yamldoesn't feel enough to justify it. If any other reason to change it raises in a future we may want to revisit this decission./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
Just as a datapoint, they don't produce the same output. yaml.Marshal with sigs.k8s.io produces
go.pkg.in/yaml.v2 produces:
I think it unlikely that we'll _never_ deal with k8s types, so my preference would be to stick with sigs.k8s.io libraries.