See https://github.com/cupy/cupy/pull/4996#discussion_r606606797. I don't immediately see a band-aid fix, so keep this documented until I (or someone else!) do.
I would like to work on this, as soon as #4996 is merged!
I think one way to do it is to not call their NumPy counterparts (ex: cupy.fft._fft.hfft) directly, but instead call the internal function _fft or _fftn so that we can pass overwrite_x.
Yes I think that will fix it. Otherwise performing a deep copy to the input array from the auxiliary array might also work, but that would mean using additional space for the auxiliary array.
Most helpful comment
I would like to work on this, as soon as #4996 is merged!