Spacy: Compare with other NLP tools like Stanza

Created on 11 Jul 2020 · 3Comments · Source: explosion/spaCy

Hi,
I wonder if spacy have ever compared itself with other NLP libraries and tools like stanza that recently published a paper and said that it is far better than spacy.

meta resolved

Source

PoriNiki

Most helpful comment

Actually, for Russian, using pre-trained vectors, SpaCy gave me the same POS and DEP parser quality as Stanza but SpaCy is 3x faster on CPU.
(And BERT-like models are much slower but give even higher quality).

buriy on 12 Jul 2020

👍2

All 3 comments

We're working on new benchmarks for v3, but in the meantime, the nice thing about having experiments on the same data is that not everyone needs to run every experiment.

Stanza's experiments are well-conducted overall, and their conclusion that Stanza is noticeably more accurate than spaCy is true (as of spaCy v2). There are some experiments where spaCy's score should be a few percentage points higher (e.g. by using pretrained word vectors as features, and/or by using the "spacy pretrain" feature for language model initialization). But the difference definitely wouldn't be enough to catch up and achieve similar accuracy scores to Stanza.

spaCy v3 makes it much easier to configure the models, and its machine learning API is finally well documented with the release of Thinc earlier this year. I expect to release a suite of transformer-based models that achieve accuracy close to the current state of the art, perhaps even with our noses just slightly in front. (If so I'm sure someone else will publish a more accurate model shortly after.)

In the meantime, you can use the spacy-stanza plugin if you'd like to use Stanza with spaCy's API. Or you can just use Stanza directly --- it's up to you.