Good afternoon,
I have been using Flair for a while now and am very happy with its ease of use and performance.
However, I was wondering: is it possible to create a ColumnCorpus without the need for a train.txt, dev.txt, and test.txt? I load my data from a Pandas DataFrame, and it feels quite awkward to write the data to files first only to load it back right after.
I have searched around, but haven't been able to find a way to do this. If this is just not possible, could anyone explain why this choice was made?
Thanks in advance, and have a nice day!
Hi @fabero ,
I have used something like this in the past.
from flair.datasets import SentenceDataset
from flair.data import Corpus, Sentence
def get_flair_dataset_from_dataframe(data, text_col, label_col):
sentences = list(data.apply(lambda row: Sentence(row[text_col]).add_label('class', row[label_col]), axis=1))
return SentenceDataset(sentences)
train_dataset = get_flair_dataset_from_dataframe(train_df, "text_column", "label_column")
dev_dataset = get_flair_dataset_from_dataframe(val_df, "text_column", "label_column")
test_dataset = get_flair_dataset_from_dataframe(test_df, "text_column", "label_column")
corpus = Corpus(train=train_dataset, dev=dev_dataset, test=test_dataset, name="my_corpus", sample_missing_splits=False)
Hope this helps!
Hi @kishaloyhalder ,
That's great! Thanks a lot!
Most helpful comment
Hi @fabero ,
I have used something like this in the past.
Hope this helps!