Spacy: Training baseline scores vary despite random seeds fixed

Created on 5 Jun 2020 · 6Comments · Source: explosion/spaCy

Raised this issue first on the Prodigy support forum here but it's actually a Spacy issue.

I have been using prodigy to train a 'texcat' model like so:

python -m prodigy train textcat my_annotations en_vectors_web_lg --output ./my_model

and I noticed that the baseline score hugely varies between runs (0.2-0.55). This is even more puzzling to me given fix_random_seed(0) is called at the beginning of training.

I tracked down these variations to be coming from the model output. This is a minimal example to re-create this behaviour.

How to reproduce the behaviour

import spacy

component = 'textcat'
pipe_cfg = {"exclusive_classes": False}

for i in range(5):
    spacy.util.fix_random_seed(0)

    nlp = spacy.load('en_vectors_web_lg')

    example = ("Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.",
                {'cats': {'Labe1': 1.0, 'Label2': 0.0, 'Label3': 0.0}})


    # Set up component pipe
    nlp.add_pipe(nlp.create_pipe(component, config=pipe_cfg), last=True)
    pipe = nlp.get_pipe(component)
    for label in set(example[1]['cats']):
        pipe.add_label(label)

    # Set up training and optimiser
    optimizer = nlp.begin_training(component_cfg={component: pipe_cfg })

    # Run one document through textcat NN for scoring
    print(f"Scoring '{example[0]}'")
    print(f"Result: {pipe.model([nlp.make_doc(example[0])])}")

Calling fix_random_seeds should create the same output given a fixed seed and no weight updates as far as I understand. It does indeed in the linear model but not the CNN model if I read the architecture of the model correctly here
https://github.com/explosion/spaCy/blob/908dea39399bbc0c966c131796f339af5de54140/spacy/_ml.py#L708
So the output from the first half of the first layer stays the same for each iteration but the second half does not.

Your Environment

spaCy version: 2.2.4
Platform: Darwin-18.7.0-x86_64-i386-64bit
Python version: 3.7.7
thinc version 7.4.0

bug feat / textcat

Source

michel-ds

👍1

Most helpful comment

Hi @michel-ds, we found the problem and resolved it in PR #5735 - I added your specific test to the test suite and it runs now without error: https://github.com/explosion/spaCy/blob/develop/spacy/tests/regression/test_issue5551.py
This will be fixed from spaCy 3.0 onwards.

svlandeg on 9 Jul 2020

🎉1 👍1

All 6 comments

As I said on the Prodigy forum, thanks for the report! If you have time to try it on v2.3 once it's out, let us know how you go.

honnibal on 6 Jun 2020

👍1

Still no joy despite the update to spaCy v2.3 I'm afraid.
The output still varies between runs:

```
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.49230167 0.74453074 0.7368677 ]]
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.80102295 0.80705464 0.22152871]]
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.5665726 0.5354606 0.15627414]]
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.05486786 0.22895002 0.74283147]]
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: [[0.360634 0.48460913 0.5300093 ]]
````

Info about spaCy

spaCy version: 2.3.0
Platform: Darwin-18.7.0-x86_64-i386-64bit
Python version: 3.7.7
thinc version: 7.4.1

michel-ds on 18 Jun 2020

😕1

Thanks for checking! We'll look into this.

svlandeg on 26 Jun 2020

svlandeg on 9 Jul 2020

🎉1 👍1

Hi @svlandeg
I can confirm that I am getting identical numbers with the develop branch version of SpaCy.

Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1149c64d0>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1127b1c20>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1149ddf80>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x113bf3560>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1127b8a70>)

I had to use a blank model in the code snippet above nlp = spacy.blank("en") but I hope that didn't falsify the results of my test.

Thanks for fixing! Looking forward to version 3.0.

michel-ds on 10 Jul 2020

👍1

Awesome, thanks for checking!

This is getting ahead of ourselves a little - we're still working on proper documentation - but in v.3 you'll be able to specify your configuration in full instead of just the few parameters you could define previously.

Right now, if you pass in an empty config, it takes the default one, which is defined here: https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/textcat_defaults.cfg. This is equivalent to the settings you passed in before. You can also find the BOW and CNN settings for textcat in that same folder, if you're interested.

You can also see some more examples, and how to use these configs, in this test: https://github.com/explosion/spaCy/blob/develop/spacy/tests/pipeline/test_textcat.py#L135

 pipe_config = {"model": textcat_config}
 textcat = nlp.create_pipe("textcat", pipe_config)

svlandeg on 10 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Compare operator (==) behaves unexpectedly on spacy tokens

ank-26 · 3Comments

does char level features using charCNN are used for NER in spacy?

prashant334 · 3Comments

How to train the NER to recognize addresses

bebelbop · 3Comments

why the performance of lemmatizing of spacy is so slow compared with nltk

tonywangcn · 3Comments

EntityLinker, pipes.pyx KeyError: '0_12' using sample code given in guides

curiousgeek0 · 3Comments