Flair: Add support for biomedical NER datasets

Created on 16 Feb 2019  路  7Comments  路  Source: flairNLP/flair

A great enhancement would be to have built-in dataset loaders for biomedical NER datasets.

A good resource are the datasets mentioned in the BioBERT paper. The datasets can be downloaded form the BioBERT repository.

Thus, the following datasets can be supported:

  • NCBI disease
  • BC5CDR (Disease and Drug/Chemical)
  • BC4CHEMD
  • BC2GM
  • JNLPBA
  • LINNAEUS
  • Species-800
enhancement

All 7 comments

Yes that's a great idea! Together with the ELMo pubmed model (#502) and the Flair pubmed embeddings (#518) this could enable more research into biomedical data.

For now I'm going to write importers for the following datasets:

  • JNLPBA
  • NCBI-disease
  • bc5cdr

Thanks to the SciBERT authors, these datasets are already preprocessed and can simply be fetched from their GitHub repository :)

@stefan-it Were you able to write importers for some of the datasets? If yes, could you reference the PR here?
If no, then I could help in that, along with the other medical NER datasets that were not in SciBERT.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

There is a highly active branch currently in which my colleagues are adding a large amount of different biomedical datasets. See #1513. I think it will be merged soon!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Support for biomedical datasets was added in Flair 0.6. If any datasets are missing let us know!

Was this page helpful?
0 / 5 - 0 ratings