Use composition instead of inheritance and mixins.
A typical way of using this would go something like the following:
In such a world, the trainer can be instantiated like so:
trainer_instance = Trainer(dataset_reader_instance, model_instance, training_loop_manager, callback_instances_list, platform_specific_kwargs ...)
Here, everything non-essential including logging, early-stopping, etc. goes into the callbacks. Every callback while registering itself with the trainer, will tell the trainer what attributes it requires on the trainer to function. Once all the callbacks register themselves the trainer does a sanity check — warn if multiple callbacks want to access the same attribute, error out if two callbacks have asked for exclusive write access for an attribute, etc.
Additional benefit:
Now, the issue would be that the user has to instantiate these different instances in their final training script. We can automate that using a config manager like in AllenNLP and provide the user with a standard CLI train, test, predict commands. This way if a user wants to use their own dataset to train a model, they need to define a reader class for it in a separate file, modify one line in an existing config JSON or YAML file and use CLI command train new_config.yaml
Concern:
If the worry is that the user needs to learn about all the callbacks to even start training a model, then that can be addressed by providing a sample/default config file which includes the configuration for all important callbacks. So in essence, if the user wants to use the default training setup, they just need to copy the default training config file and change the model and dataset configuration.
Code using mixins is hard to read, debug, extend and test if the mixins are not completely decoupled. Moreover, using composition coupled with an automatic way of instantiating classes will make for a very clean and fast user interface.
Currently, the same as what is provided above in the Feature Proposal. Will update as/if things become more concrete.
Decouple the mixins, define their responsibilities clearly and have detailed "for developer" documentation describing these.
Hi! thanks for your contribution!, great first issue!
If the worry is that the user needs to learn about all the callbacks to even start training a model, then that can be addressed by providing a sample/default config file which includes the configuration for all important callbacks. So in essence, if the user wants to use the default training setup, they just need to copy the default training config file and change the model and dataset configuration.
I strongly disagree with this... having to read docs to understand something means we were lazy with the API design and made something not intuitive... this goes against the core principle in lightning.
I'm happy to entertain rewriting the internals of lightning to make it more contributor friendly, but not the external API. I would look at all the past frameworks as lessons learned and not things done "correctly" -- especially anything in the tensorflow and keras family which have (in my opinion) really poorly designed APIs
My two cents is that the external API of PL has already proven it's a good API and should not be modified (and I don't even talk about the burden of breaking backward compatibility...).
As for the internal API, I always found the mixin pattern to be super confusing when reading PL code. That being said I don't see a cleaner way to do it. @dhruvdcoder if you can come with a better and cleaner pattern than the current mixin one, feel free!
I understand your concern regarding changing the external API. I believe that the composition can replace mixins and inheritance by putting all the object creation logic inside the constructor (this compromises the reusability of the smaller components but let's ignore that for a moment) of the trainer. The difficult part is dividing the responsibilities amongst the smaller objects. However, one good thing here is that the existing mixins give us some idea about the logical boundaries that should exist among the smaller objects. We could start off by converting these mixins into independent classes that are instantiated and set as attributes inside the trainer with a rule that the trainer's self should not be passed to these smaller objects -- they should be able to function independently.
Thoughts?
not sure. we need to think about this carefully. many of us have thought about the current structure very hard. maybe suggest a code snippet of how you envision this working?
ie: do a real example.
maybe classes to encapsulate each set of functionality isn’t bad but we need to think very hard about this because there’s a chance we can also make sure to make the framework more flexible in the process
I will try to give it shot once I have some time. Hopefully after a couple of weeks. Feel free to close the issue for now, if there isn't anything else to discuss.
As for the internal API, I always found the mixin pattern to be super confusing when reading PL code.
Mixin is a little difficult to read at times but I'm not sure if composition would be that much better. It's not too bad at all if you use grep -nr to navigate and find where things are being declared and called. I've personally found Keras and TF to be impossible to understand both in terms of their codebases and in terms of their external API in some ways.
@dhruvdcoder I've written something similar which used composition rather than mixin. One key problem was taking advantage of object orientated programming and inheirtance. Mixin actually gives you a really nice way to structure the code using the inheritance of objects, whereas it becomes messier when you use composition. I think mixin also has the advantage of having lower technical debt as compositions can easily get very complex and out of hand as things get bigger (which I think the problem Keras and TF ran into)
@dhruvdcoder how is it going, still interested in this PR?
@Borda I went back to using AllenNLP for my current projects and am unable to work on this currently. Sorry about that. Please feel free to close this issue.
Most helpful comment
My two cents is that the external API of PL has already proven it's a good API and should not be modified (and I don't even talk about the burden of breaking backward compatibility...).
As for the internal API, I always found the mixin pattern to be super confusing when reading PL code. That being said I don't see a cleaner way to do it. @dhruvdcoder if you can come with a better and cleaner pattern than the current mixin one, feel free!