Hello there,
I've been repurposing the Sequence Tagger model. Basically, it will involve stacking a few layers on top of the NER model. That is, passing the labeled sentences to the next module, using the_obtain_labels() method. Of course, I would be calculating the loss at the top, to then do a backward pass, which should propagate the errors through the whole ensemble, including the transition matrix of the CRF part, if I'm getting it right.
However,_viterbi_decode(), the method which predicts the labels, uses NumPy ndarrays and operations instead of PyTorch Tensors, therefore detaching them from the computation graph. So it looks like autograd won't be able to calculate the gradient for the transition matrix.
I noticed there's a method for calculating loss in the original model, and this one doesn't use _viterbi_decode() but instead _forward_alg(), which does use PyTorch tensors.
So my questions are:
_viterbi_decode() and then use a plain loss function like log-likelihood? _viterbi_decode()?Thanks for your time!
Greetings.
Santiago.
PD: I'm new to PyTorch...
Hello @s-glitch,
the loss of a crf-NN is not only the _forward_alg() but (as shown here the difference between _forward_alg() and _score_sentence
_score_sentence calculates the engery of the labeled sentence, while _forward_alg() calculates the total engergy for any labeling.
on the other hand _viterbi_encode() calculates the the most likely sequence given the weights and input features.
Yes, in theory you could simply use _viterbi_decode() and cross_entropy to get the same result, however _viterbi_decode() simply takes longer to compute. It's way simpler to only calculate the total loss, instead of all individual losses for each token.
_viterbi_decode() uses numpy instead of pytorch, bc that algorithm is not paralellizable and runs faster on the cpu than on the (usually prefered) gpu. there is a really interesting blog post about the optimisation progress: https://towardsdatascience.com/why-we-switched-from-spacy-to-flair-to-anonymize-french-legal-cases-e7588566825f
Hello @helpmefindaname
I see. That is just what I needed to know. It was helpful. Thank you!
Have a nice day :)
Most helpful comment
Hello @s-glitch,
the loss of a crf-NN is not only the
_forward_alg()but (as shown here the difference between_forward_alg()and_score_sentence_score_sentencecalculates the engery of the labeled sentence, while_forward_alg()calculates the total engergy for any labeling.on the other hand
_viterbi_encode()calculates the the most likely sequence given the weights and input features.Yes, in theory you could simply use
_viterbi_decode()andcross_entropyto get the same result, however_viterbi_decode()simply takes longer to compute. It's way simpler to only calculate the total loss, instead of all individual losses for each token._viterbi_decode()uses numpy instead of pytorch, bc that algorithm is not paralellizable and runs faster on the cpu than on the (usually prefered) gpu. there is a really interesting blog post about the optimisation progress: https://towardsdatascience.com/why-we-switched-from-spacy-to-flair-to-anonymize-french-legal-cases-e7588566825f