Is your feature request related to a problem? Please describe.
I'd like to optimize BART with ONNX Runtime, but it looks like the only Attention operator currently supported is self-attention, and BART requires encoder/decoder cross-attention.
System information
Describe the solution you'd like
A fused operator implementing encoder/decoder cross-attention
Describe alternatives you've considered
It is in the backlog, there is no ETA currently. @wangyems.
It is in the backlog, there is no ETA currently. @wangyems.
We could target this in ORT 1.6 release @tianleiwu
Most helpful comment
We could target this in ORT 1.6 release @tianleiwu