Fairseq: Camembert - Experiments in French

Created on 3 Dec 2019 · 11Comments · Source: pytorch/fairseq

Hello,
Thanks a lot for your nice work on CamemBERT.
Do you plan to release some code to help us to reproduce your results of the experiments describes in the paper (CamemBERT: a Tasty French Language Model).
Thanks,

question

Source

PierreColombo

👍1

Most helpful comment

Hi,

It wouldn’t be trivial to modify run_ner.py for parsing as it requires a
graph prediction layer. We plan to release the code soon.

Benjamin

On Sat, Dec 28, 2019 at 3:08 PM LeoDeep notifications@github.com wrote:

Hi @stefan-it https://github.com/stefan-it and @louismartin
https://github.com/louismartin ,

I successfully used the run_ner.py script in order to evaluate the PoS
tagging task for the CamemBERT

Currently I'm working on evaluating the dependency parsing task
(evaluating UAS and LAS on the same dataset I used for PoS (e.g. ParTUT).
The dataset contains all the information I need since the head and the
dependency relation are labelled for each word.

However I'm wondering how the depedency parsing tasks can be implemented.
Is there a script or change in run_ner.py available that allows this type
of task?

Thanks for everything as always. Cheers!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pytorch/fairseq/issues/1450?email_source=notifications&email_token=AEHOJYYWZOT4Q6KN7NRBWO3Q25MW5A5CNFSM4JUXIQV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHYKUPI#issuecomment-569420349,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEHOJYYJ3GF3HXYSBGUF2GDQ25MW5ANCNFSM4JUXIQVQ
.

benjamin-mlr on 28 Dec 2019

❤2 👍1

All 11 comments

@louismartin

huihuifan on 3 Dec 2019

Hi @PierreColombo
The experiments were run using fairseq for pretraining and XNLI and HuggingFace's Transformers for the rest.
Most of the code is already present in the two librairies.

louismartin on 6 Dec 2019

Hi @louismartin , do you plan to add examples for fine-tuning a model for sequence tagging tasks (NER and Pos tagging) 🤔

I wasn't able to reproduce the results for PoS tagging with my extension of the fine-tuning NER code in 🤗/Transformers, so I'm just wondering if I missed something in the implementation.

(It's perfectly working with the feature-based approach in Flair btw 😅)

stefan-it on 11 Dec 2019

Hi @stefan-it , we do plan to release the full code that will allow you to train and use our NER, POS Parser and NLI models. It will come in the next weeks.
Meanwhile, can I ask you how you did it ?
My first guess is that is comes from the Optimization : what Optimizer , learning rate, batch size , number of epochs did you use ?
Thanks,
Benjamin

benjamin-mlr on 18 Dec 2019

👀2

Hi Benjamin,

I re-run the PoS tagging experiment with the latest version of 🤗/Transformers. I used the default parameters of the run_ner script and trained both camemBERT and multilingual BERT models on the ParTUT dataset for 30 epochs (only one run).

The results are consistent with the experiment on ParTUT done in the paper: camemBERT is ~0.24% better 🎉

Btw: do you plan to add support for camemBERT into the fairseq library or into pytext (like it is done for XLM-R) 🤔

Thanks for your help!

Stefan

stefan-it on 24 Dec 2019

Hi Stefan,

Thanks for reproducing the experiments !

Yes we do plan to release it in faiseq at some point. It should come
begining of January

Benjamin

On Tue, Dec 24, 2019 at 12:20 AM Stefan Schweter notifications@github.com
wrote:

Hi Benjamin,

I re-run the PoS tagging experiment with the latest version of 🤗/Transformers.
I used the default parameters of the run_ner script and trained both
camemBERT and multilingual BERT models on the ParTUT dataset for 30 epochs
(only one run).

The results are consistent with the experiment on ParTUT done in the
paper: camemBERT is ~0.24% better 🎉

Btw: do you plan to add support for camemBERT into the fairseq library or
into pytext (like it is done for XLM-R) 🤔

Thanks for your help!

Stefan

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pytorch/fairseq/issues/1450?email_source=notifications&email_token=AEHOJY2MOVKSM2NWZABN7MLQ2FBSJA5CNFSM4JUXIQV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHSDNCI#issuecomment-568604297,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEHOJY75FMWPMI7HVXNFEF3Q2FBSJANCNFSM4JUXIQVQ
.

benjamin-mlr on 24 Dec 2019

Dear all,
I am a newby that just came to camembert and tried to use it following the NLI example from Roberta.

When I type this line, it appears that "mnli" is not recognized (KeyError).
tokens = camembert.encode('Salut.', 'Bonjour.')
camembert.predict('mnli',tokens).argmax() # 0: contradiction

What would be the correct "head" instead of mnli ?
All the best, and thanks a lots for this great piece of work,
Eric Brunet Gouet
CH Versailles

ericbrunetgouet on 26 Dec 2019

Hi @ericbrunetgouet
We did not release the NLI model yet, therefore you cannot use the model like this unless you retrain it for NLI yourself.
Thanks,
Louis

louismartin on 26 Dec 2019

Dear Louis,
Thanks a lot. Hope you will do that. I will try my hypotheses with the English version.
Best regards
Eric

ericbrunetgouet on 26 Dec 2019

Hi @stefan-it and @louismartin ,
I successfully used the run_ner.py script in order to evaluate the PoS tagging task for the CamemBERT

Currently I'm working on evaluating the dependency parsing task (evaluating UAS and LAS on the same dataset I used for PoS (e.g. ParTUT). The dataset contains all the information I need since the head and the dependency relation are labelled for each word.
However I'm wondering how the depedency parsing tasks can be implemented. Is there a script or change in run_ner.py available that allows this type of task?

Thanks for everything as always. Cheers!

LeoDeep on 28 Dec 2019

👀1

Hi,

It wouldn’t be trivial to modify run_ner.py for parsing as it requires a
graph prediction layer. We plan to release the code soon.

Benjamin

On Sat, Dec 28, 2019 at 3:08 PM LeoDeep notifications@github.com wrote:

Hi @stefan-it https://github.com/stefan-it and @louismartin
https://github.com/louismartin ,

I successfully used the run_ner.py script in order to evaluate the PoS
tagging task for the CamemBERT

Currently I'm working on evaluating the dependency parsing task
(evaluating UAS and LAS on the same dataset I used for PoS (e.g. ParTUT).
The dataset contains all the information I need since the head and the
dependency relation are labelled for each word.

However I'm wondering how the depedency parsing tasks can be implemented.
Is there a script or change in run_ner.py available that allows this type
of task?

Thanks for everything as always. Cheers!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pytorch/fairseq/issues/1450?email_source=notifications&email_token=AEHOJYYWZOT4Q6KN7NRBWO3Q25MW5A5CNFSM4JUXIQV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHYKUPI#issuecomment-569420349,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEHOJYYJ3GF3HXYSBGUF2GDQ25MW5ANCNFSM4JUXIQVQ
.