Hi,
I am following this tutorial to train Flair embeddings on my corpus. My folder structure looks exactly the same as in the tutorial but still when loading the corpus I get this error that directory does not exists (but it does):
loading dictionary ...
loading corpus ...
Traceback (most recent call last):
File "train_flair_emb.py", line 29, in
character_level=True)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/flair/trainers/language_model_trainer.py", line 173, in __init__
shuffle_lines=False)[0]
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/flair/trainers/language_model_trainer.py", line 30, in __init__
assert path.exists()
AssertionError
Line29 corresponds to this code:
corpus = TextCorpus(Path(corpus_path),
dictionary,
is_forward_lm,
character_level=True)
Here is the complete snippet:
from pathlib import Path
from flair.data import Dictionary
from flair.models import LanguageModel
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus
import timeit
start_time = timeit.default_timer()
corpus_path = '/home/ubuntu/ws/resources/Spanish-Corporas/raw-corpus/corpus/'
char_mappings = '/home/ubuntu/ws/resources/Spanish-Corporas/raw-corpus/flair/char-mappings.pkl'
emb_w_path = '/home/ubuntu/ws/resources/Spanish-Corporas/raw-corpus/flair/spanish_med-forward'
# are you training a forward or backward LM?
is_forward_lm = True
print('loading dictionary ...')
# load the default character dictionary
# dictionary: Dictionary = Dictionary.load('chars')
dictionary: Dictionary = Dictionary.load_from_file(char_mappings)
print('loading corpus ...')
# get your corpus, process forward and at the character level
corpus = TextCorpus(Path(corpus_path),
dictionary,
is_forward_lm,
character_level=True)
# instantiate your language model, set hidden size and number of layers
language_model = LanguageModel(dictionary,
is_forward_lm,
hidden_size=128,
nlayers=1)
# train your language model
trainer = LanguageModelTrainer(language_model, corpus)
print('training ...')
trainer.train(emb_w_path,
sequence_length=10,
mini_batch_size=10,
max_epochs=10)
end_time = timeit.default_timer()
print('done ...')
print('The code ran for %.2fm' % (end_time - start_time) / 60.)
And the directory structure looks like this:

And the train directory:

May be I am doing something really stupid but I can't seem to figure that out.
Any help would be appreciated,
Many Thanks!
Hello @unknown1990 that is odd. It looks like it cannot find the path you specify since it fails on assert path.exists()
Could you try just calling assert Path(corpus_path).exists() to see if this works?
@alanakbik Thank you for your reply.
So interestingly, output of print('Path(corpus_path).exists(): {}'.format(Path(corpus_path).exists())) is:

This looks like a bug in flair?
@unknown1990
did you solve it ?
@unknown1990
did you solve it ?
I had the same problem, as u can see @unknown1990 should rename the "dev.txt" file to "valid.txt"
I changes the name and it worked.
Most helpful comment
I had the same problem, as u can see @unknown1990 should rename the "dev.txt" file to "valid.txt"
I changes the name and it worked.