Transformers: considerd to add albert?

Created on 29 Sep 2019  路  13Comments  路  Source: huggingface/transformers

馃殌 Feature

Motivation

Additional context

wontfix

Most helpful comment

The official code and models got released :slightly_smiling_face:
https://github.com/google-research/google-research/tree/master/albert

All 13 comments

Would definitely love to see an implementation of ALBERT added to this repository. Just for completeness:

That said, it could be even more interesting to implement the core improvements (factorized embedding parameterization, cross-layer parameter sharing) from ALBERT in (some?/all?) other transformers as optional features?

Knowing how fast the team works, I would expect ALBERT to be implemented quite soon. That being said, I haven't had time to read the ALBERT paper yet, so it might be more difficult than previous BERT iterations such as distilbert and RoBERTa.

I think ALBERT is very cool! Expect...

And in pytorch (using code from this repo and weights from brightmart) https://github.com/lonePatient/albert_pytorch

Any Update on the progress?

The ALBERT paper will be presented at ICLR in April 2020. From what I last heard, the huggingface team has been talking with the people over at Google AI to share the details of the model, but I can imagine that the researchers rather wait until the paper has been presented. One of those reasons being that they want to get citations from their ICLR talk rather than an arXiv citation which, in the field, is "worth less" than a big conference proceeding.

For now, just be patient. I am sure that the huggingface team will have a big announcement (follow their Twitter/LinkedIn channels) with a new version bump. No need to keep bumping this topic.

The official code and models got released :slightly_smiling_face:
https://github.com/google-research/google-research/tree/master/albert

[WIP]
ALBERT in tensorflow 2.0
https://github.com/kamalkraj/ALBERT-TF2.0

https://github.com/lonePatient/albert_pytorch

Dataset: MNLI
Model: ALBERT_BASE_V2
Dev accuracy : 0.8418

Dataset: SST-2
Model: ALBERT_BASE_V2
Dev accuracy :0.926

[WIP]
ALBERT in tensorflow 2.0
https://github.com/kamalkraj/ALBERT-TF2.0

Verison 2 weights added.
Support for SQuAD 1.1 and 2.0 added.
Reproduces the same results from paper. From my experiments, ALBERT model is very sensitive to hyperparameter like Batch Size. FineTuning using AdamW as Default in Original Repo. AdamW performs better than LAMB on Model finetuning.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alphanlp picture alphanlp  路  3Comments

hsajjad picture hsajjad  路  3Comments

guanlongtianzi picture guanlongtianzi  路  3Comments

quocnle picture quocnle  路  3Comments

yspaik picture yspaik  路  3Comments