Raised this issue first on the Prodigy support forum here but it's actually a Spacy issue.
I have been using prodigy to train a 'texcat' model like so:
python -m prodigy train textcat my_annotations en_vectors_web_lg --output ./my_model
and I noticed that the baseline score hugely varies between runs (0.2-0.55). This is even more puzzling to me given fix_random_seed(0) is called at the beginning of training.
I tracked down these variations to be coming from the model output. This is a minimal example to re-create this behaviour.
import spacy
component = 'textcat'
pipe_cfg = {"exclusive_classes": False}
for i in range(5):
spacy.util.fix_random_seed(0)
nlp = spacy.load('en_vectors_web_lg')
example = ("Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.",
{'cats': {'Labe1': 1.0, 'Label2': 0.0, 'Label3': 0.0}})
# Set up component pipe
nlp.add_pipe(nlp.create_pipe(component, config=pipe_cfg), last=True)
pipe = nlp.get_pipe(component)
for label in set(example[1]['cats']):
pipe.add_label(label)
# Set up training and optimiser
optimizer = nlp.begin_training(component_cfg={component: pipe_cfg })
# Run one document through textcat NN for scoring
print(f"Scoring '{example[0]}'")
print(f"Result: {pipe.model([nlp.make_doc(example[0])])}")
Calling fix_random_seeds should create the same output given a fixed seed and no weight updates as far as I understand. It does indeed in the linear model but not the CNN model if I read the architecture of the model correctly here
https://github.com/explosion/spaCy/blob/908dea39399bbc0c966c131796f339af5de54140/spacy/_ml.py#L708
So the output from the first half of the first layer stays the same for each iteration but the second half does not.
As I said on the Prodigy forum, thanks for the report! If you have time to try it on v2.3 once it's out, let us know how you go.
Still no joy despite the update to spaCy v2.3 I'm afraid.
The output still varies between runs:
```
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.49230167 0.74453074 0.7368677 ]]
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.80102295 0.80705464 0.22152871]]
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.5665726 0.5354606 0.15627414]]
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.05486786 0.22895002 0.74283147]]
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.360634 0.48460913 0.5300093 ]]
````
Thanks for checking! We'll look into this.
Hi @michel-ds, we found the problem and resolved it in PR #5735 - I added your specific test to the test suite and it runs now without error: https://github.com/explosion/spaCy/blob/develop/spacy/tests/regression/test_issue5551.py
This will be fixed from spaCy 3.0 onwards.
Hi @svlandeg
I can confirm that I am getting identical numbers with the develop branch version of SpaCy.
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1149c64d0>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1127b1c20>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1149ddf80>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x113bf3560>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1127b8a70>)
I had to use a blank model in the code snippet above nlp = spacy.blank("en") but I hope that didn't falsify the results of my test.
Thanks for fixing! Looking forward to version 3.0.
Awesome, thanks for checking!
This is getting ahead of ourselves a little - we're still working on proper documentation - but in v.3 you'll be able to specify your configuration in full instead of just the few parameters you could define previously.
Right now, if you pass in an empty config, it takes the default one, which is defined here: https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/textcat_defaults.cfg. This is equivalent to the settings you passed in before. You can also find the BOW and CNN settings for textcat in that same folder, if you're interested.
You can also see some more examples, and how to use these configs, in this test: https://github.com/explosion/spaCy/blob/develop/spacy/tests/pipeline/test_textcat.py#L135
pipe_config = {"model": textcat_config}
textcat = nlp.create_pipe("textcat", pipe_config)
Most helpful comment
Hi @michel-ds, we found the problem and resolved it in PR #5735 - I added your specific test to the test suite and it runs now without error: https://github.com/explosion/spaCy/blob/develop/spacy/tests/regression/test_issue5551.py
This will be fixed from spaCy 3.0 onwards.