libsamplerate claims to be fast and high quality, but I would like an alternative that
@Flowstoner I forgot about that! I even have a benchmark that I wrote between that and libsamplerate last year. I'll have to dig that back out and reconsider.
Also https://github.com/xiph/speexdsp/blob/master/libspeexdsp/resample.c should be compared. Definitely much more active than r8 and used more widely.
just simple upsample and dowsample:
https://github.com/olilarkin/wdl-ol/blob/master/WDL/besselfilter.h
https://github.com/olilarkin/wdl-ol/blob/master/WDL/besselfilter.cpp
@disabled or anyone else willing to take this issue:
Take this file https://github.com/VCVRack/Rack/blob/v0.5/include/dsp/samplerate.hpp and modify it so that it uses a library other than libsamplerate. libspeex looked nice (above), but I didn't look too hard, so you can choose another similarly licensed library if you like (non GPL).
It should handle "don't resample if it's not needed" logic, and it doesn't need to smoothly interpolate sample rate ratios (like libsamplerate does). A setSampleRate(float) which throws away the internal buffer is fine.
You can change the API if you like, if you think of something better. It would be fantastic to be able to query the number of inputs samples required to guarantee n outputs, but I realize that this is somewhat difficult with polyphase resamplers.
I don't have a contribution guideline, but just try to be as C-like as possible.
Another Polyphase resampler...
HIIR, which is nowadays licensed under the DWTFYWPL
http://ldesoras.free.fr/prod.html
Here is a branch that replaces libsamplerate with speexdsp. For discussion only at this point. Anyone wants to build it, be sure to make dep. You may have to install automake, libtool and pkg-config.
I chose speex to try as an alternative since it looked like they'd built it with a preference for speed over maximum quality.
But of course, it's complicated. Here is some benchmarking of it vs libsamplerate, processing 8 interleaved audio channels, the way AudioInterface does:
BM_libsamplerate_44100_48000_mean 347391 ns 346849 ns 2017
BM_libspeexdsp_quality4_44100_48000_mean 391421 ns 390179 ns 1853
BM_libspeexdsp_quality5_44100_48000_mean 505979 ns 504954 ns 1371
BM_libsamplerate_44100_192000_mean 1391146 ns 1388415 ns 505
BM_libspeexdsp_quality4_44100_192000_mean 1538041 ns 1536170 ns 436
BM_libspeexdsp_quality5_44100_192000_mean 2025020 ns 2022389 ns 347
BM_libsamplerate_48000_44100_mean 315493 ns 315023 ns 2100
BM_libspeexdsp_quality4_48000_44100_mean 376000 ns 375325 ns 1884
BM_libspeexdsp_quality5_48000_44100_mean 469861 ns 468926 ns 1516
BM_libsamplerate_192000_44100_mean 287424 ns 287064 ns 2343
BM_libspeexdsp_quality4_192000_44100_mean 363647 ns 363078 ns 1965
BM_libspeexdsp_quality5_192000_44100_mean 438833 ns 438437 ns 1575
(Everything test line is suffixed _mean because these the averages of multiple runs; the second numeric column is CPU time. Speex has a quality setting of 1 to 10; quality 4 is the default.)
Which is slower than libsamplerate.
However, processing a single channel of audio, it's faster:
BM_libsamplerate_44100_48000_mean 103122 ns 102927 ns 6062
BM_libspeexdsp_quality4_44100_48000_mean 49483 ns 49318 ns 13428
BM_libspeexdsp_quality5_44100_48000_mean 62551 ns 62448 ns 11338
BM_libsamplerate_44100_192000_mean 414149 ns 413431 ns 1598
BM_libspeexdsp_quality4_44100_192000_mean 196400 ns 195964 ns 3612
BM_libspeexdsp_quality5_44100_192000_mean 255179 ns 254668 ns 2838
BM_libsamplerate_48000_44100_mean 90966 ns 90892 ns 6870
BM_libspeexdsp_quality4_48000_44100_mean 47318 ns 47259 ns 14077
BM_libspeexdsp_quality5_48000_44100_mean 59084 ns 58775 ns 11893
BM_libsamplerate_192000_44100_mean 66299 ns 66200 ns 9370
BM_libspeexdsp_quality4_192000_44100_mean 45247 ns 45179 ns 15698
BM_libspeexdsp_quality5_192000_44100_mean 56691 ns 56603 ns 12875
(Internally, the speex interleaved-audio processing routine just calls the one-channel version in a loop over the channels.)
Profiling in Rack bears out the relative speeds.
Using Rack built with speex, it sounds the same to me, but maybe someone with a better ear or better hardware will notice a difference.
That said, looking around in AudioInterface, and if the immediate goal is to reduce the CPU hit from resampling, here's a range of options:
Adopt the same-samplerate-do-nothing change from the performance discussion. That just removes the resampling overhead for many users.
Change AudioInterface to reset the various buffers and resampler to have the same number of channels as the device when the device changes. Currently, regardless of the device's actual channels, it processes 8 channels in or out -- a good bit of needless processing if the device has only, say, 2 channels. Changing this would also presumably make it easier in future to support devices with more than 8 channels.
Maybe only resample channels that are being used (that are patched in). Unused input channels (input from the device) could just be ignored, while unused output channels can just get zeroes. Channels in use would each get their own 1-channel resampler. It doesn't seem like this would cause cases where channels might get out of sync (if independent resamplers had different lags) -- but tell me if that's wrong.
If one-channel resamplers are the order of the day, switching to speex could make sense. Otherwise, I think it doesn't.
If all those things worked, looks like it'd shake most of the fruit from the resampling-optimization tree.
I'm willing to tackle it, but @AndrewBelt, will wait your guidance.
Fantastic research! This is an example of a rare PR/post that helps Rack.
It's weird that speex's sample rate converter requires integer (or rational) ratios, but it makes up for that by offering 11 levels of quality (FIR lengths I assume), which can be set after construction! That alone is enough to make the switch IMO, since I consider Rack's current sample rate converter to be a bit higher than is worth it.
Send a PR on the master branch, and I'll work on (1) and (3). I'll build (1) into the SampleRateConverter class and (3) into the Audio Interface. I'll play with some ideas, but I don't mind making a "pop" on all active channels when something is patched, in order to resynchronize the streams.
Glad it helps. PR created.
After some more testing, looks like no synchronization will be required. Consider:
#include <stdio.h>
#include <assert.h>
#include "speex/speex_resampler.h"
int main() {
const int channels = 1;
const int inRate = 44100;
const int outRate = 88200;
const int n = 256;
float samples[n] {};
// existing resampler.
SpeexResamplerState *src1State = NULL;
{
int error;
src1State = speex_resampler_init(channels, inRate, outRate, SPEEX_RESAMPLER_QUALITY_DEFAULT, &error);
assert(error == RESAMPLER_ERR_SUCCESS);
}
// run the existing resampler for a while.
float src1Out[2 * n];
unsigned int src1InN, src1OutN;
for (int i = 0; i < 10; ++i) {
src1InN = n;
src1OutN = 2 * n;
speex_resampler_process_float(src1State, 0, samples, &src1InN, src1Out, &src1OutN);
assert(src1InN == n);
assert(src1OutN == 2 * n);
}
// new resampler.
SpeexResamplerState *src2State = NULL;
{
int error;
src2State = speex_resampler_init(channels, inRate, outRate, SPEEX_RESAMPLER_QUALITY_DEFAULT, &error);
assert(error == RESAMPLER_ERR_SUCCESS);
}
// set an impulse.
samples[0] = 1.0;
// process samples through each resampler.
src1InN = n;
src1OutN = 2 * n;
speex_resampler_process_float(src1State, 0, samples, &src1InN, src1Out, &src1OutN);
assert(src1InN == n);
assert(src1OutN == 2 * n);
float src2Out[2 * n];
unsigned int src2InN = n;
unsigned int src2OutN = 2 * n;
speex_resampler_process_float(src2State, 0, samples, &src2InN, src2Out, &src2OutN);
assert(src2InN == n);
assert(src2OutN == 2 * n);
// what's the output latency?
int lag = speex_resampler_get_output_latency(src1State);
assert(lag == speex_resampler_get_output_latency(src2State));
printf("Output latency in samples: %d (output at %dhz)\n\n", lag, outRate);
// expect to find the impulse "lag"-many samples into the ouptut.
for (int i = 0; i < 2*n; ++i) {
if (src1Out[i] > 0.1 || src1Out[i] < -0.1) {
printf("Old SRC out %d: %f\n", i, src1Out[i]);
}
}
printf("\n");
for (int i = 0; i < 2*n; ++i) {
if (src2Out[i] > 0.1 || src2Out[i] < -0.1) {
printf("New SRC out %d: %f\n", i, src2Out[i]);
}
}
speex_resampler_destroy(src1State);
speex_resampler_destroy(src2State);
return 0;
}
Which outputs:
Output latency in samples: 64 (output at 88200hz)
Old SRC out 59: 0.110882
Old SRC out 61: -0.202111
Old SRC out 63: 0.633214
Old SRC out 64: 0.940000
Old SRC out 65: 0.633214
Old SRC out 67: -0.202111
Old SRC out 69: 0.110882
New SRC out 59: 0.110882
New SRC out 61: -0.202111
New SRC out 63: 0.633214
New SRC out 64: 0.940000
New SRC out 65: 0.633214
New SRC out 67: -0.202111
New SRC out 69: 0.110882
So parallel inputs to the running and newly-initialized resamplers show up in the output at the same time.
Yes, but what if out rate / in rate is not an integer?
Same thing. The lag will be a different value, but it's a constant function of the ratio and quality, independent of the samples that have been fed through.
This is likely a dumb question, but why does Rack have to do any sample rate conversion? It looks like you can choose your desired sample rate when you open a stream with rtaudio.
@briansorahan Correct. The audio device sample rate might be different than the internal sample rate, you might have multiple audio interfaces with different sample rates, and modules like Braids or Clouds might operate at a fixed sample rate and need conversion to the engine sample rate.
Hmmm the multiple audio interfaces situation makes sense. And I actually just noticed today the conversion happening in the AudibleInstruments modules. I'm guessing that's necessary because the pichenette code is meant to run on a device where the sample rate is known ahead of time
Exactly.
Thx @AndrewBelt
Probably not of much interest, but from time to time I have used very "terrible" SR conversion, and unless you hit it with super high frequencies it can be difficult to hear the difference. Some that I have used in various places: cubic polynomial interpolation (which I guess is a bad low order FIR filter), and 4 pole IIR lowpass filter. I know, I know, there are terrible, but perhaps as a configurable "low quality" option?
@squinkylabs Use https://github.com/VCVRack/Rack/blob/master/include/dsp/samplerate.hpp#L25 I think the scale is 0 to 10.
Another option here, not sure if it's been looked at yet: https://www.kfrlib.com/
Example: https://github.com/kfrlib/kfr/blob/master/examples/sample_rate_conversion.cpp
If you comply with the GNU Public License v3, you can get KFR for free
Closing because this has been solved with libspeexdsp
Most helpful comment
Here is a branch that replaces libsamplerate with speexdsp. For discussion only at this point. Anyone wants to build it, be sure to
make dep. You may have to install automake, libtool and pkg-config.I chose speex to try as an alternative since it looked like they'd built it with a preference for speed over maximum quality.
But of course, it's complicated. Here is some benchmarking of it vs libsamplerate, processing 8 interleaved audio channels, the way AudioInterface does:
(Everything test line is suffixed
_meanbecause these the averages of multiple runs; the second numeric column is CPU time. Speex has a quality setting of 1 to 10; quality 4 is the default.)Which is slower than libsamplerate.
However, processing a single channel of audio, it's faster:
(Internally, the speex interleaved-audio processing routine just calls the one-channel version in a loop over the channels.)
Profiling in Rack bears out the relative speeds.
Using Rack built with speex, it sounds the same to me, but maybe someone with a better ear or better hardware will notice a difference.
That said, looking around in AudioInterface, and if the immediate goal is to reduce the CPU hit from resampling, here's a range of options:
Adopt the same-samplerate-do-nothing change from the performance discussion. That just removes the resampling overhead for many users.
Change AudioInterface to reset the various buffers and resampler to have the same number of channels as the device when the device changes. Currently, regardless of the device's actual channels, it processes 8 channels in or out -- a good bit of needless processing if the device has only, say, 2 channels. Changing this would also presumably make it easier in future to support devices with more than 8 channels.
Maybe only resample channels that are being used (that are patched in). Unused input channels (input from the device) could just be ignored, while unused output channels can just get zeroes. Channels in use would each get their own 1-channel resampler. It doesn't seem like this would cause cases where channels might get out of sync (if independent resamplers had different lags) -- but tell me if that's wrong.
If one-channel resamplers are the order of the day, switching to speex could make sense. Otherwise, I think it doesn't.
If all those things worked, looks like it'd shake most of the fruit from the resampling-optimization tree.
I'm willing to tackle it, but @AndrewBelt, will wait your guidance.