Flair: AssertionError While trying to train Flair embeddings

Created on 18 May 2019  路  4Comments  路  Source: flairNLP/flair

Hi,
I am following this tutorial to train Flair embeddings on my corpus. My folder structure looks exactly the same as in the tutorial but still when loading the corpus I get this error that directory does not exists (but it does):

loading dictionary ...
loading corpus ...
Traceback (most recent call last):
File "train_flair_emb.py", line 29, in
character_level=True)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/flair/trainers/language_model_trainer.py", line 173, in __init__
shuffle_lines=False)[0]
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/flair/trainers/language_model_trainer.py", line 30, in __init__
assert path.exists()
AssertionError

Line29 corresponds to this code:

corpus = TextCorpus(Path(corpus_path),
                    dictionary,
                    is_forward_lm,
                    character_level=True)

Here is the complete snippet:

from pathlib import Path

from flair.data import Dictionary
from flair.models import LanguageModel
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus

import timeit


start_time = timeit.default_timer()

corpus_path = '/home/ubuntu/ws/resources/Spanish-Corporas/raw-corpus/corpus/'
char_mappings = '/home/ubuntu/ws/resources/Spanish-Corporas/raw-corpus/flair/char-mappings.pkl'
emb_w_path = '/home/ubuntu/ws/resources/Spanish-Corporas/raw-corpus/flair/spanish_med-forward'

# are you training a forward or backward LM?
is_forward_lm = True
print('loading dictionary ...')
# load the default character dictionary
# dictionary: Dictionary = Dictionary.load('chars')
dictionary: Dictionary = Dictionary.load_from_file(char_mappings)


print('loading corpus ...')
# get your corpus, process forward and at the character level
corpus = TextCorpus(Path(corpus_path),
                    dictionary,
                    is_forward_lm,
                    character_level=True)

# instantiate your language model, set hidden size and number of layers
language_model = LanguageModel(dictionary,
                               is_forward_lm,
                               hidden_size=128,
                               nlayers=1)

# train your language model
trainer = LanguageModelTrainer(language_model, corpus)
print('training ...')
trainer.train(emb_w_path,
              sequence_length=10,
              mini_batch_size=10,
              max_epochs=10)
end_time = timeit.default_timer()
print('done ...')
print('The code ran for %.2fm' % (end_time - start_time) / 60.)

And the directory structure looks like this:
image

And the train directory:
image

May be I am doing something really stupid but I can't seem to figure that out.
Any help would be appreciated,
Many Thanks!

question

Most helpful comment

@unknown1990
did you solve it ?

I had the same problem, as u can see @unknown1990 should rename the "dev.txt" file to "valid.txt"
I changes the name and it worked.

All 4 comments

Hello @unknown1990 that is odd. It looks like it cannot find the path you specify since it fails on assert path.exists()

Could you try just calling assert Path(corpus_path).exists() to see if this works?

@alanakbik Thank you for your reply.
So interestingly, output of print('Path(corpus_path).exists(): {}'.format(Path(corpus_path).exists())) is:

image

This looks like a bug in flair?

@unknown1990
did you solve it ?

@unknown1990
did you solve it ?

I had the same problem, as u can see @unknown1990 should rename the "dev.txt" file to "valid.txt"
I changes the name and it worked.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jannenev picture jannenev  路  3Comments

ciaochiaociao picture ciaochiaociao  路  3Comments

aschmu picture aschmu  路  3Comments

alanakbik picture alanakbik  路  3Comments

inyukwo1 picture inyukwo1  路  3Comments