I am using Bert Embedding and i am getting this error
RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191
I have given train,test and dev.txt in this format word word ... word <label>.To do so i have changed data_fetcher.py to have sentence and give it a label rather than giving label to all words in sentence.
Whereas i am able to start training by using other embedding like flair etc.
Hi @nareshmungpara could you paste a full minimum code example with one sentence to reproduce the error?
Hi, I have the same error. Below my minimal code example:
import torch
from flair.embeddings import StackedEmbeddings
from flair.data import Sentence
from flair.embeddings import BertEmbeddings, DocumentPoolEmbeddings
embeddings = DocumentPoolEmbeddings([BertEmbeddings('bert-base-multilingual-cased')])
text = "Litwo! Ojczyzno moja! Ty jeste艣 jak zdrowie. Ile ci臋 trzeba by艂o wida膰. Zwr贸cona na przeciwnej zajad艂o艣膰 dowiod臋, 偶e zamczysko wzi臋li艣my w okolicy. i tam do 艂ona a reszt臋 rozdzielono mi臋dzy wierzycieli. Zamku 偶aden wzi膮艣膰 nie mo偶e. Wida膰, 偶e serce mu s艂owo ciocia ko艂o uch brz臋cza艂o ci膮gle S臋dziemu t艂umaczy艂 dlaczego urz膮dzenie pa艅skie przeinaczy艂 we brzozowym gaju sta艂 dw贸r szlachecki, z liczby kopic, co wysz艂a. jeszcze skinieniem g艂owy potakiwa艂. S臋dzia go grzecznie, na wieczerz臋. on ekwipa偶 parskali ze cztery. Tymczasem na wyb贸r wzi膮艂 czerstwo艣膰 i knieje wi臋c szanuj膮 przyjaci贸艂 jak d艂ugo uczy膰, a偶eby pan Wojski z ko艂ka zdj臋ty do nas wytuza. U nas starych wi臋cej ksi膮偶kowej nauki. Ale stryj na utrzymanie. Lecz mniej pilni. Tadeusz Telimenie, Asesor za艣 Gotem. Do艣膰, 偶e wa偶ny i wkr贸tce wielki post - nowe wiary, prawa, toalety. Mia艂a nad umys艂ami wielk膮 moc ta chwa艂a nale偶y chartu Soko艂owi. Pytano zdania bo tak i stoi wypisany ka偶dy mimowolnie porz膮dku pilnowa艂. Bo nie zawadzi. Blisko艣膰 piwnic wygodna s艂u偶膮cej czeladzi. Tak ka偶e przyzwoito艣膰). nikt tam ma jutro sam markiz przybra艂 tytu艂 markiza. Jako偶, kiedy kar臋 na nim spostrzeg艂 si臋, 偶e nam, 偶e odg艂os tr膮bki i po kryjomu. Ch艂opiec, co dzie艅 powszedni. N贸偶ek, cho膰 suknia kr贸tka, oko pa艅skie przeinaczy艂 we 艣nie. Podr贸偶ny zl膮k艂 si臋, sp贸jrza艂, lecz nim odszed艂, wyskoczy艂 na stosach Moskali siek膮c wrog贸w, a drug膮 do us艂ug publicznych sposobi艂 z odmienn膮 mod膮, pod lasem zwaliska. Po drodze Wo藕ny po gromie: w kt贸re na kt贸re na nim i silni do nas wytuza. U nas powr贸cisz cudem Gdy w bitwie, gdzie chce, wchodzi byle."
sentence_text = Sentence(text)
embeddings.embed(sentence_text)
Hello guys,
I got it working now, only change i made was I change model_save path where model is saved.
Let me know if you also get is working and what was the issue.
Hello, what is the status upon this issue? I have the exact same error, the following also results in RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191 :
from flair.embeddings import BertEmbeddings
from flair.data import Sentence
bert_embedding = BertEmbeddings('bert-base-multilingual-cased')
bert_embedding.embed(Sentence(
'''In de OVER DE FUNCTIE Develop segmentations predictive models and statistical insights using appropriate tools Analyse data deeply to understand patterns and trends Transform these insights into actionable reports targeting algorithms and personalisation filters Understand key drivers of rules/model variation and communicate insights to regional and global executives Provide technical expertise in statistical analysis mathematical modelling data mining/machine learning Partner with business units to challenge their thinking provide direction Work with global teams on ad hoc projects and take a key role in international projects Share best practices with analysts and managers located around the world Have fun while driving innovation at one of the top brands on the Internet with the help of cutting edge technologies OVER DE FUNCTIE Develop segmentations predictive models and statistical insights using appropriate tools Analyse data deeply to understand patterns and trends Transform these insights into actionable reports targeting algorithms and personalisation filters Understand key drivers of rules/model variation and communicate insights to regional and global executives Institutionalise customer analytics in business processes and decision making both strategic and tactical Provide technical expertise in statistical analysis mathematical modelling data mining/machine learning Partner with business units to challenge their thinking provide direction Work with global teams on ad hoc projects and take a key role in international projects Share best practices with analysts and managers located around the world Have fun while driving innovation at one of the top brands on the Internet with the help of cutting edge technologies p><span style="font family arial helvetica sans serif font size small;">We do 't have a long description of requirements for the various functions.</span></p><p><span style="font family arial helvetica sans serif font size small;">Project/Program Manager to digital internal processes Strategic process digitalization analyze the existing tooling and advise/consultant span></p><p><span style="font family arial helvetica sans serif font size small;">Project Manager development in domain of Insurance/Finance</span></p><p><span style="font family arial helvetica sans serif font size small;">Service Oriented Architect senior capable of leading a team of 5/10 people from the customer experience on SOA or ideally in Oracle OSB</span></p><p><span style="font family arial helvetica sans serif font size small;">2 application architects in the insurance domain/process</span></p><p><span style="font family arial helvetica sans serif font size small;"></span><span style="font family arial helvetica sans serif font size small;">1 Integration architect SOA </span></p><p><span style="font family arial helvetica sans serif font size small;">1 Oracle OSB Oracle Service Bus Engineer/Developer</span></p Project description Job Mission The Connectivity Service Manager is responsible for the clients Connectivity services DC LAN/LAN/WAN through out the entire lifespan'''
))
Ok, I see this is due to the texts exceeding the maximum sequence length for the BERT model (mostly 512 in-vocabulary tokens and separators, hence approx 256 words). Since pytorch-pretrained-bert>=0.5.0 an understandable ValueError is returned instead.
I ran into this issue, but I managed to fix it on my end by using the BERT tokenizer to ensure that each of my sentences were < 512 BERT tokens long (really 510 since [CLS] and [SEP] tokens will be added to each sentence downstream), and then I trimmed any sentences that exceeded this length. A transformation is needed to convert a list of trimmed BERT tokens back into the original text before passing the text to BertEmbeddings.embed (yes, the act of performing BERT tokenization is not completely reversible, as information such as the text's original casing is lost, but I'm using an uncased BERT model, so I can live with that).
flair.embeddings.py lines 1426-1434 compute the longest sentence length in a batch of BERT-tokenized sentences and seem to use this value for the max sequence length downstream, but there's no guarantee that this value doesn't exceed 512 BERT tokens, which might explain the index errors.
Most helpful comment
Ok, I see this is due to the texts exceeding the maximum sequence length for the BERT model (mostly 512 in-vocabulary tokens and separators, hence approx 256 words). Since
pytorch-pretrained-bert>=0.5.0an understandableValueErroris returned instead.