Because of CommandQueue::present_surface blocking for VSync in swap_chain_present, the function will keep various resources locked for up to 16ms on 60Hz displays.
I've looked through various documentation and things written on this specific issue, and it seems like to properly fix this, multiple queues may be required.
Since the queue is used while presenting, it cannot be concurrently used to present two windows at the same time, or even record another window's rendering commands.
Some improvements to the current system may be possible to reduce the amount of resources locked at the same time, but this is still likely going to cause a stall.
The key issue seems to be that specifically the NVIDIA driver blocks on present. This doesn't seem to happen on AMD which blocks on acquire. (I've been told this, I haven't been able to verify this)
I've discussed the current issue with various people and have come to the current possible solutions. I won't be working on any of these right now but I wanted to write them down for reference:
Multi-queue is on the radar here: https://github.com/gfx-rs/wgpu/issues/1066