It is my understanding that the constant-Q transform doesn't have a fixed hop size as it varies for different frequencies. However, LibRosa's CQT takes an argument "hop_length" which I'm finding very confusing (the documentation is not helpful). What is this argument and how is it consistent with the definition of the transform? Also, using the Matlab implementation of the transform produces different results than those return by librosa. why is that?
I think your understanding may not be correct. The hop length is fixed as it determines the center position of the frames; what changes is the frame lengths (as a function of frequency).
And yes, the Matlab version is different, for many reasons. The main one being the implementation of resampling, though I'm sure there are others. (IIRC they also do some non uniform frequency spacing tricks to cover all the way up to Nyquist, which we don't do.)
Thank you. That was helpful