Implementing the Transformer model as an optional replacement of CNN.
Now I am not entirely sure if this would make sense but it just seems that the Transformer model is very promising in every aspect in the NLP space so I was wondering if it would be worth implementing that in spaCy if its even possible?
There is an ongoing effort to implement this for spaCy.
You can have a look in this branch of my forked repo of thinc.
The implementation is not ready yet but there has been significant progress, so hopefully, in the next days it will be 100% ready and merged to master branch.
If I've learned one thing, it's definitely don't say things like "hopefully in the next days" on the enhancement threads ;).
The performance of the transformer model is very interesting. I must admit I dismissed the "Attention is all you need" paper when I first read it. I figured sure, this is another way you can compute approximately the same sort of thing as an LSTM or CNN, but it wasn't clear that it had any real advantage. Obviously I'm wrong and these models have continued to do very well.
It's still totally unclear whether the transformer will improve anything in spaCy. We at least want to conduct the experiments, though. But even if it works, I expect the models will simply get a bit more accurate and possibly a little faster.
Here's my forecast: 60% chance the transformer is no better than the current CNN; 20% chance small improvement; 15% chance solid improvement; 5% chance big improvement. I'd define error reductions of 0-5% as small, 5-10% as solid, and 10%+ as big.
This is interesting- it would be convenient to be able to test elmo and Bert on problems directly via Spacy
https://jalammar.github.io/illustrated-bert/
Are there any progress on the Transformer implementation?
@mr-bjerre See https://github.com/explosion/spacy-pytorch-transformers 馃帀
You guys !! C,") I salute you
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
There is an ongoing effort to implement this for spaCy.
You can have a look in this branch of my forked repo of thinc.
The implementation is not ready yet but there has been significant progress, so hopefully, in the next days it will be 100% ready and merged to master branch.