The barriers we currently issue have a very generic stage coverage (i.e. "all_images() -> all_images()"). We could track the stages used in each synchronization unit and OR those stages together when issuing a barrier.
Is this still an issue?
Yes, we don't have that implemented. It's an optimization. Ideally, we'd play with it once we have any performance metrics to track.
Most helpful comment
Yes, we don't have that implemented. It's an optimization. Ideally, we'd play with it once we have any performance metrics to track.