Describe the bug
The TextClassifier model loading crashes when model is trained on albert-base-v2
To Reproduce
albert-base-v2. Save the model.Expected behavior
Model should load successfully.
Environment (please complete the following information):
Additional context
There is workaround that involves monkey patching a bit of code like this
from types import MethodType
import transformers
vocab_file = transformers.tokenization_albert.AlbertTokenizer.from_pretrained("albert-base-v2").vocab_file
def _setstate(self, d): # Method to patch with
self.__dict__ = d
try:
import sentencepiece as spm
except ImportError:
logger.warning(
"You need to install SentencePiece to use AlbertTokenizer: https://github.com/google/sentencepiece"
"pip install sentencepiece"
)
raise
self.sp_model = spm.SentencePieceProcessor()
self.sp_model.Load(vocab_file)
# Actual Patching being done here
transformers.tokenization_albert.AlbertTokenizer.__setstate__ = MethodType(
_setstate, transformers.tokenization_albert.AlbertTokenizer(vocab_file )
)
Having to do this everytime is crazy. Maybe we can implement some better way of handling this issue
Thanks for reporting this - @whoisjones can you take a look?
i'll take a look and comment here @alanakbik @mittalsuraj18
@mittalsuraj18 issue lies in huggingface lib, similar issue has been opened last week for MarianMT. SentencePiece save its files in cache, thus they can't be found on another machine. Please open a respective issue according to the linked one in huggingface.