Onnxruntime: Optimizing BART: encoder/decoder attention

Created on 8 Oct 2020  路  2Comments  路  Source: microsoft/onnxruntime

Is your feature request related to a problem? Please describe.
I'd like to optimize BART with ONNX Runtime, but it looks like the only Attention operator currently supported is self-attention, and BART requires encoder/decoder cross-attention.

System information

  • ONNX Runtime version (you are using): 1.4.0

Describe the solution you'd like
A fused operator implementing encoder/decoder cross-attention

Describe alternatives you've considered

enhancement

Most helpful comment

It is in the backlog, there is no ETA currently. @wangyems.

We could target this in ORT 1.6 release @tianleiwu

All 2 comments

It is in the backlog, there is no ETA currently. @wangyems.

It is in the backlog, there is no ETA currently. @wangyems.

We could target this in ORT 1.6 release @tianleiwu

Was this page helpful?
0 / 5 - 0 ratings