Transformers: Feature Request: Pipeline for Query/Document relevance

Created on 27 Dec 2019 · 3Comments · Source: huggingface/transformers

Pipelines for IR tasks

Justification

In the last few years, a bunch of deep architectures were proposed for Ad-hoc retrieval, most with limited success (if any). However, BERT(et al)-based models are finally pushing the state of the art for Ad-hoc retrieval. In fact in the last TREC had a Deep Learning track whre "NNLM" (neural network language models) dominated any other traditional (Mostly BM25 and variations) and other deep models.

So, it's a current trend that BERT should be the new baseline for any proposed model for IR.

Description

There should be a pipeline-like feature that is able to score pairs of documents and user queries. Probably, pre-trained on a dataset like the MSMarco dataset for TREC'19. Ideally, this would also support a list of documents to rank and return scores.

In real-life applications, one would probably want to combine BERT scores with a traditional baseline scores (like QL or BM25). So, the score is needed (or, even better, combine something like pyserini in the backend?)

Usage

from transformers import pipeline

# Allocate a pipeline for sentiment-analysis
nlp = pipeline('document-relevancy')
nlp({
    'query 'can hives be a sign of pregnancy',
    'context' '<document content>'
})
>>> {'score': 0.28756016668193496'}

I have already used DistilBERT on a paper to appear on ECIR2020 (Diagnosing BERT with Retrieval Heuristics), and would be able to contribute the model for this (even for bert-base).

I would also love to contribute with this, but will probably need some guidance, if anyone is willing to help.

Thanks!

Pipeline wontfix

Source

ArthurCamara

👍5

Most helpful comment

unstale because this is very interesting

julien-c on 6 Mar 2020

👍6

All 3 comments

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 6 Mar 2020

unstale because this is very interesting

julien-c on 6 Mar 2020

👍6

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 6 May 2020

Was this page helpful?

0 / 5 - 0 ratings