I am trying to train a Swedish model with tagging, parsing and embedded word vectors for similarity scoring. Training data comes from https://github.com/UniversalDependencies/UD_Swedish-Talbanken and word vectors are trained using gensim.
Training a small model with pretraining but without embedded vectors works from the command line. Likewise a large model with embedded vectors but without pretraining also works great. The problem arises when trying
trying to train a large model with embedded vectors and pretraining.
As I understand it "spacy pretrain" uses the command --use-vectors argument is used if you want the word model to include features from the word vectors.
However, although pretrain works fine with the --use-vectors command the "spacy train" command fails.
python -m spacy init-model sv ./init_models/word2vec -v ./vectors/word2vec.txt
python -m spacy pretrain './corpus/sents_webbnyheter2013.jsonl' './init_models/word4vec' ./pretrained/ud_w2v' --use-vectors -i 100
python -m spacy train sv models_temp ./corpus/ud_swedish_talbanken_json_sent10/sv_talbanken-ud-train.json ./corpus/ud_swedish_talbanken_json_sent10/sv_talbanken-ud-dev.json -p 'tagger,parser' -t2v ./pretrained/ud_w2v/model99.bin -g 0 -n 55
Training pipeline: ['tagger', 'parser']
Starting with blank model 'sv'
Counting training words (limit=0)
Loaded pretrained tok2vec for: ['tagger', 'parser']
Itn Tag Loss Tag % Dep Loss UAS LAS Token % CPU WPS GPU WPS
✔ Saved model to output directory
models_temp/model-final
â ™ Creating best model...
Traceback (most recent call last):
File "/home/gustav/anaconda3/lib/python3.7/site-packages/spacy/cli/train.py", line 365, in train
losses=losses,
File "/home/gustav/anaconda3/lib/python3.7/site-packages/spacy/language.py", line 516, in update
proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
File "nn_parser.pyx", line 424, in spacy.syntax.nn_parser.Parser.update
File "_parser_model.pyx", line 214, in spacy.syntax._parser_model.ParserModel.begin_update
File "_parser_model.pyx", line 262, in spacy.syntax._parser_model.ParserStepModel.__init__
File "/home/gustav/anaconda3/lib/python3.7/site-packages/thinc/neural/_classes/feed_forward.py", line 46, in begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/thinc/api.py", line 295, in begin_update
X, bp_layer = layer.begin_update(layer.ops.flatten(seqs_in, pad=pad), drop=drop)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/thinc/neural/_classes/feed_forward.py", line 46, in begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/thinc/api.py", line 379, in uniqued_fwd
Y_uniq, bp_Y_uniq = layer.begin_update(X_uniq, drop=drop)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/thinc/neural/_classes/feed_forward.py", line 46, in begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/thinc/neural/_classes/layernorm.py", line 60, in begin_update
X, backprop_child = self.child.begin_update(X, drop=0.0)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/thinc/neural/_classes/maxout.py", line 76, in begin_update
output__boc = self.ops.gemm(X__bi, W, trans2=True)
File "ops.pyx", line 860, in thinc.neural.ops.CupyOps.gemm
File "/home/gustav/anaconda3/lib/python3.7/site-packages/cupy/linalg/product.py", line 35, in dot
return a.dot(b, out)
File "cupy/core/core.pyx", line 1306, in cupy.core.core.ndarray.dot
File "cupy/core/core.pyx", line 1940, in cupy.core.core.dot
ValueError: Axis dimension mismatch
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/gustav/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/gustav/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/spacy/__main__.py", line 35, in
plac.call(commands[command], sys.argv[1:])
File "/home/gustav/anaconda3/lib/python3.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func((args + varargs + extraopts), *kwargs)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/spacy/cli/train.py", line 486, in train
best_model_path = _collate_best_model(meta, output_path, nlp.pipe_names)
File "/home/gustav/anaconda3/lib/python3.7/site-packages/spacy/cli/train.py", line 554, in _collate_best_model
path2str(best_component_src / component), path2str(best_dest / component)
TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
Hi @gustavengstrom , this functionality is still a bit experimental so you could definitely be hitting a non-happy path that hasn't been tested before. I hope we can find a way to fix it though!
So just to summarize: you're storing the pretrained models in ./pretrained/ud_w2v, taking the last one (99) and using that for training the tagger and the parser, right?
Are you sure all settings are the same between pretraining and training?
Is it an option for you to update spaCy to the latest version (2.2.1) and check whether the error is still there? Please note though that upgrading will also require updating/retraining your models, cf. https://github.com/explosion/spaCy/releases/tag/v2.2.0 .
What happens in the code is that the train command will read the file from pretraining, and use that as weights for the tok2vec layer for both the tagger and the parser. And from your error message (only the first block is relevant), it looks like the dimensions of the layers in the parser's neural net are not matching up.
Thanks! I tried with updated version as well. The settings should be the same since I am not using any specifying and model specific attributes in either pretrain or train. I agree that it looks like a dimension missmatch... I will try to post a minimal working example asap so that the error can be reproduced...
That would be very helpful !
@svlandeg I have created a git repository that reproduces the error. Should be reproducible if you simply clone the repository and run the commands in the readme file. This seems like a bug to me... I have reproduced the error both on my ubuntu server and my macbook.
@svlandeg Have you had a chance to look at this problem? Alternatively is there a way to feed in the word vectors post training?
@gustavengstrom : apologies for the late follow-up.
Looking at your commands, I noticed you specify --use-vectors during pretrain, so you'll also have to define -v for train, otherwise the train script will not accomodate for pretrained vectors in its Tok2Vec component and there will indeed be a dimension mismatch when loading your pretrained model from file. See also my comment here.
Thanks! This worked for reference. Final train command was amended as follows:
python -m spacy train sv models_temp sv_talbanken-ud-train.json sv_talbanken-ud-dev.json -p 'tagger,parser' -t2v ./pretrained/model9.bin -v init_models -n 10
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.