Modify BERT models (src/transformers/modeling_bert.py) to conform to TorchScript requirements, so they can be jit.script()-ed, not just jit.trace()-ed (as is currently the only supported option)
Note: I have a working version implementing this, which I would like to contribute.
See below.
A scriptable model would allow for variable-length input, offering big speedup gains and simplification (no need to create different models for different input lengths).
In addition, it would avoid other potential pitfalls with tracing (e.g., code paths that are input dependent and not covered by the tracing example input).
Related issues:
https://github.com/huggingface/transformers/issues/2417
https://github.com/huggingface/transformers/issues/1204
possibly also
https://github.com/huggingface/transformers/issues/1477
https://github.com/huggingface/transformers/issues/902
I have a working PR that modifies all the models in src/transformers/modeling_bert.py and makes them TorchScript-able. I have not tested it on other models that use BERT components (e.g., albert), but it should be possible to expand the capability to those, as well.
However, it would require some significant work to make it ready for submission: besides formatting, documentation, testing etc., my current version changes the method signatures, and I would need to avoid that to maintain backward-compatibility.
Before putting in that work, I'd like to make sure that such a PR is something you'd be interested in and would be willing to merge in, assuming it meets the requirements.
Hi! This is interesting. Could you resume what are the changes that would be needed in order to have our models scriptable?
Sure, mostly my changes fall into these categories:
Most of these are pretty small changes and do not affect the logic. #4 and #1c can be tricky, and #5 might be an issue with recent changes made here: https://github.com/huggingface/transformers/pull/4874
Hi @sbrody18,
Thanks for opening this issue and taking the time to dive into our TorchScript support.
Regarding _A scriptable model would allow for variable-length input, offering big speedup gains and simplification_:
Do you have some numbers to compare against the current transformers library? We ran some TorchScript tests and the differences where not that huge at that time, may be this has changed since? I (and probably others) would be very interested in knowing more on this aspect.
Regarding the list of changes you suggested:
I'm currently not really in favour of such changes as they are almost changing all the way the library is designed and would have an impact on all the models. Some of them might be further discussed if there are real performance benefits.
Hi @mfuntowicz,
My co-workers and I have run the experiments that show that inference time scales more-or-less linearly with the input size (also supported in the linked article below).
Assuming you are trying to run in C++ (which is the reason to use TorchScript), the current solution, using trace() means that you can only use fixed length input - you have to set a large value for max_length to support your longest expected input, and zero-pad all input to the max-length.
That means if your max_length is 1000 tokens and your average length is 20 tokens, your inference is taking 50x longer than it should.
You can see an example of how big a difference this makes, here, under 'Scenario #3: Smaller Inputs (Dynamic Shapes)'.
I'm guessing the tests you ran were focused specifically on the technical behavior of the models on a fixed input set and didn't take into account the max-length issue. Also, this is only an issue if you need to use TorchScript in order to run in C++.
Re. the change to design, my intention is to keep the model changes to a minimum (e.g., adding type hints and asserts does not change the design at all) and make sure they are fully backwards compatible. There would still be some changes required, but I don't think they are drastic.
As I said in the original post, I have a PR where I did a lot of the work, and I'd be happy to work with someone to figure out how to get it to a state where it can be merged.
@sbrody18 do you mind sharing your fork ?
Yes, I can do so, but it may have to wait a week or two - things are busy at the moment.
I am very interested in this work as well. Our team would like to be able to use TorchScript so we can train without depending on Python. If there's any way I can be of help, I would gladly offer some time here!
Sorry for the delay. I hope to have a reasonable PR later this week.
My change is available at https://github.com/sbrody18/transformers/tree/scripting
Note that it is based off of a commit from earlier this month:
https://github.com/huggingface/transformers/compare/ef0e9d806c51059b07b98cb0279a20d3ba3cbc1d...sbrody18:scripting
Since then there have been changes made to the BertModel interface adding a return_tuple argument and changing the return type of the forward method, and this would require more effort to resolve.
I listed the principles I used in https://github.com/huggingface/transformers/issues/5067#issuecomment-644989375. The original components tended to return different sized tuples, depending on arguments, which is problematic for TorchScript. When a component BertX required an interface change to be scriptable, I made a BertScriptableX version with the modifications, and had the BertX component inherit from it and just modify the output so it is compatible with the original API.
I made scriptable versions of BertModel and all the BertFor\
Note that my change disables the gradient_checkpoint path in the encoder. I think this can be resolved, but I didn't have the time to work on it.
@sgugger @joeddav: see comment above for preliminary PR.
Probably too big and complicated to try to merge as is, but would be happy to work with someone to break things down into reasonable chunks.
Thanks for all the work. Looking at this and our recent changes in the model API (in particular the return_dict argument) I think we probably won't be able to have the models be fully compatible with TorchScript. What is possible however would be to have a second version of the models that don't have the option of return_dict (we can also remove output_hiddens/output_attentions if it makes life easier) and would be fully scriptable.
Since you already started with some components in a different class, I think we should have two models (let's say BertModel and ScriptableBertModel) with the same named parameters so you can seemlessly save/load from one to the other (a workflow would then be to experiment with BertModel, save the fine-tuned model and then go to ScriptableBertModel for inference for instance).
Then I'm not sure what's easiest:
I think we should focus on having a proof of concept on one model before moving forward with others.
That makes sense to me. It will probably result in some amount of code duplication, and we'd need to make sure we keep the named parameters in sync, but probably easier to maintain.
So would you suggest the ScriptableBertModel is a separate file?
Not necessarily a separate file, I guess it depends on the amount of code to rewrite. I think we can worry about this in a second stage, once we have a good poc.
@sgugger Please see POC implementation in PR above.
@sbrody18 in the original PR https://github.com/huggingface/transformers/pull/6846 you created for this issue, you mentioned you saw a large perf increase with dynamic sequences. What did you use as a test to make that determination?
@kevinstephano - see discussion and conclussions here
We saw a large perfomance increase with an older version of PyTorch, where traced models required the input to be the same length as the one used for tracing, making it necessary to pad short sequences at inference, and adding a lot of unnecessary computation overhead.
With recent versions of PyTorch (>=1.3, I think), this is no longer the case.
Most helpful comment
@sgugger Please see POC implementation in PR above.