Rack: Write a sample rate conversion library to replace libsamplerate

Created on 7 Oct 2017 · 20Comments · Source: VCVRack/Rack

libsamplerate claims to be fast and high quality, but I would like an alternative that

is more configurable than just SRC_SINC_BEST_QUALITY, SRC_SINC_MEDIUM_QUALITY, SRC_SINC_FASTEST. I'd like to set the size of the windows, SNR, bandwidth, etc. I don't need variable ratios like libsamplerate does, so I suppose precomputed tables can speed up some of the work.
has a more usable API than libsamplerate's full API. Since the ratio is assumed to be constant, I'd like to be able to query how many input samples I'd need in order to produce N output samples. If this is done, there is no need to interact with an internal buffer.

Source

AndrewBelt

Most helpful comment

Here is a branch that replaces libsamplerate with speexdsp. For discussion only at this point. Anyone wants to build it, be sure to make dep. You may have to install automake, libtool and pkg-config.

I chose speex to try as an alternative since it looked like they'd built it with a preference for speed over maximum quality.

But of course, it's complicated. Here is some benchmarking of it vs libsamplerate, processing 8 interleaved audio channels, the way AudioInterface does:

BM_libsamplerate_44100_48000_mean               347391 ns     346849 ns       2017
BM_libspeexdsp_quality4_44100_48000_mean        391421 ns     390179 ns       1853
BM_libspeexdsp_quality5_44100_48000_mean        505979 ns     504954 ns       1371
BM_libsamplerate_44100_192000_mean             1391146 ns    1388415 ns        505
BM_libspeexdsp_quality4_44100_192000_mean      1538041 ns    1536170 ns        436
BM_libspeexdsp_quality5_44100_192000_mean      2025020 ns    2022389 ns        347
BM_libsamplerate_48000_44100_mean               315493 ns     315023 ns       2100
BM_libspeexdsp_quality4_48000_44100_mean        376000 ns     375325 ns       1884
BM_libspeexdsp_quality5_48000_44100_mean        469861 ns     468926 ns       1516
BM_libsamplerate_192000_44100_mean              287424 ns     287064 ns       2343
BM_libspeexdsp_quality4_192000_44100_mean       363647 ns     363078 ns       1965
BM_libspeexdsp_quality5_192000_44100_mean       438833 ns     438437 ns       1575

(Everything test line is suffixed _mean because these the averages of multiple runs; the second numeric column is CPU time. Speex has a quality setting of 1 to 10; quality 4 is the default.)

Which is slower than libsamplerate.

However, processing a single channel of audio, it's faster:

BM_libsamplerate_44100_48000_mean               103122 ns     102927 ns       6062
BM_libspeexdsp_quality4_44100_48000_mean         49483 ns      49318 ns      13428
BM_libspeexdsp_quality5_44100_48000_mean         62551 ns      62448 ns      11338
BM_libsamplerate_44100_192000_mean              414149 ns     413431 ns       1598
BM_libspeexdsp_quality4_44100_192000_mean       196400 ns     195964 ns       3612
BM_libspeexdsp_quality5_44100_192000_mean       255179 ns     254668 ns       2838
BM_libsamplerate_48000_44100_mean                90966 ns      90892 ns       6870
BM_libspeexdsp_quality4_48000_44100_mean         47318 ns      47259 ns      14077
BM_libspeexdsp_quality5_48000_44100_mean         59084 ns      58775 ns      11893
BM_libsamplerate_192000_44100_mean               66299 ns      66200 ns       9370
BM_libspeexdsp_quality4_192000_44100_mean        45247 ns      45179 ns      15698
BM_libspeexdsp_quality5_192000_44100_mean        56691 ns      56603 ns      12875

(Internally, the speex interleaved-audio processing routine just calls the one-channel version in a loop over the channels.)

Profiling in Rack bears out the relative speeds.

Using Rack built with speex, it sounds the same to me, but maybe someone with a better ear or better hardware will notice a difference.

That said, looking around in AudioInterface, and if the immediate goal is to reduce the CPU hit from resampling, here's a range of options:

Adopt the same-samplerate-do-nothing change from the performance discussion. That just removes the resampling overhead for many users.
Change AudioInterface to reset the various buffers and resampler to have the same number of channels as the device when the device changes. Currently, regardless of the device's actual channels, it processes 8 channels in or out -- a good bit of needless processing if the device has only, say, 2 channels. Changing this would also presumably make it easier in future to support devices with more than 8 channels.
Maybe only resample channels that are being used (that are patched in). Unused input channels (input from the device) could just be ignored, while unused output channels can just get zeroes. Channels in use would each get their own 1-channel resampler. It doesn't seem like this would cause cases where channels might get out of sync (if independent resamplers had different lags) -- but tell me if that's wrong.
If one-channel resamplers are the order of the day, switching to speex could make sense. Otherwise, I think it doesn't.

If all those things worked, looks like it'd shake most of the fruit from the resampling-optimization tree.

I'm willing to tackle it, but @AndrewBelt, will wait your guidance.

mdemanett on 22 Dec 2017

👍2

All 20 comments

look this https://github.com/avaneev/r8brain-free-src

Flowstoner on 8 Oct 2017

👍1

@Flowstoner I forgot about that! I even have a benchmark that I wrote between that and libsamplerate last year. I'll have to dig that back out and reconsider.

AndrewBelt on 8 Oct 2017

Also https://github.com/xiph/speexdsp/blob/master/libspeexdsp/resample.c should be compared. Definitely much more active than r8 and used more widely.

AndrewBelt on 8 Oct 2017

just simple upsample and dowsample:
https://github.com/olilarkin/wdl-ol/blob/master/WDL/besselfilter.h
https://github.com/olilarkin/wdl-ol/blob/master/WDL/besselfilter.cpp

TronicLabs on 9 Oct 2017

@disabled or anyone else willing to take this issue:

Take this file https://github.com/VCVRack/Rack/blob/v0.5/include/dsp/samplerate.hpp and modify it so that it uses a library other than libsamplerate. libspeex looked nice (above), but I didn't look too hard, so you can choose another similarly licensed library if you like (non GPL).
It should handle "don't resample if it's not needed" logic, and it doesn't need to smoothly interpolate sample rate ratios (like libsamplerate does). A setSampleRate(float) which throws away the internal buffer is fine.
You can change the API if you like, if you think of something better. It would be fantastic to be able to query the number of inputs samples required to guarantee n outputs, but I realize that this is somewhat difficult with polyphase resamplers.
I don't have a contribution guideline, but just try to be as C-like as possible.

AndrewBelt on 25 Nov 2017

Another Polyphase resampler...
HIIR, which is nowadays licensed under the DWTFYWPL
http://ldesoras.free.fr/prod.html

Flowstoner on 3 Dec 2017

I chose speex to try as an alternative since it looked like they'd built it with a preference for speed over maximum quality.

But of course, it's complicated. Here is some benchmarking of it vs libsamplerate, processing 8 interleaved audio channels, the way AudioInterface does:

BM_libsamplerate_44100_48000_mean               347391 ns     346849 ns       2017
BM_libspeexdsp_quality4_44100_48000_mean        391421 ns     390179 ns       1853
BM_libspeexdsp_quality5_44100_48000_mean        505979 ns     504954 ns       1371
BM_libsamplerate_44100_192000_mean             1391146 ns    1388415 ns        505
BM_libspeexdsp_quality4_44100_192000_mean      1538041 ns    1536170 ns        436
BM_libspeexdsp_quality5_44100_192000_mean      2025020 ns    2022389 ns        347
BM_libsamplerate_48000_44100_mean               315493 ns     315023 ns       2100
BM_libspeexdsp_quality4_48000_44100_mean        376000 ns     375325 ns       1884
BM_libspeexdsp_quality5_48000_44100_mean        469861 ns     468926 ns       1516
BM_libsamplerate_192000_44100_mean              287424 ns     287064 ns       2343
BM_libspeexdsp_quality4_192000_44100_mean       363647 ns     363078 ns       1965
BM_libspeexdsp_quality5_192000_44100_mean       438833 ns     438437 ns       1575

(Everything test line is suffixed _mean because these the averages of multiple runs; the second numeric column is CPU time. Speex has a quality setting of 1 to 10; quality 4 is the default.)

Which is slower than libsamplerate.

However, processing a single channel of audio, it's faster:

BM_libsamplerate_44100_48000_mean               103122 ns     102927 ns       6062
BM_libspeexdsp_quality4_44100_48000_mean         49483 ns      49318 ns      13428
BM_libspeexdsp_quality5_44100_48000_mean         62551 ns      62448 ns      11338
BM_libsamplerate_44100_192000_mean              414149 ns     413431 ns       1598
BM_libspeexdsp_quality4_44100_192000_mean       196400 ns     195964 ns       3612
BM_libspeexdsp_quality5_44100_192000_mean       255179 ns     254668 ns       2838
BM_libsamplerate_48000_44100_mean                90966 ns      90892 ns       6870
BM_libspeexdsp_quality4_48000_44100_mean         47318 ns      47259 ns      14077
BM_libspeexdsp_quality5_48000_44100_mean         59084 ns      58775 ns      11893
BM_libsamplerate_192000_44100_mean               66299 ns      66200 ns       9370
BM_libspeexdsp_quality4_192000_44100_mean        45247 ns      45179 ns      15698
BM_libspeexdsp_quality5_192000_44100_mean        56691 ns      56603 ns      12875

(Internally, the speex interleaved-audio processing routine just calls the one-channel version in a loop over the channels.)

Profiling in Rack bears out the relative speeds.

Using Rack built with speex, it sounds the same to me, but maybe someone with a better ear or better hardware will notice a difference.

That said, looking around in AudioInterface, and if the immediate goal is to reduce the CPU hit from resampling, here's a range of options:

Adopt the same-samplerate-do-nothing change from the performance discussion. That just removes the resampling overhead for many users.
Change AudioInterface to reset the various buffers and resampler to have the same number of channels as the device when the device changes. Currently, regardless of the device's actual channels, it processes 8 channels in or out -- a good bit of needless processing if the device has only, say, 2 channels. Changing this would also presumably make it easier in future to support devices with more than 8 channels.
Maybe only resample channels that are being used (that are patched in). Unused input channels (input from the device) could just be ignored, while unused output channels can just get zeroes. Channels in use would each get their own 1-channel resampler. It doesn't seem like this would cause cases where channels might get out of sync (if independent resamplers had different lags) -- but tell me if that's wrong.
If one-channel resamplers are the order of the day, switching to speex could make sense. Otherwise, I think it doesn't.

If all those things worked, looks like it'd shake most of the fruit from the resampling-optimization tree.

I'm willing to tackle it, but @AndrewBelt, will wait your guidance.

mdemanett on 22 Dec 2017

👍2

Fantastic research! This is an example of a rare PR/post that helps Rack.

It's weird that speex's sample rate converter requires integer (or rational) ratios, but it makes up for that by offering 11 levels of quality (FIR lengths I assume), which can be set after construction! That alone is enough to make the switch IMO, since I consider Rack's current sample rate converter to be a bit higher than is worth it.

Send a PR on the master branch, and I'll work on (1) and (3). I'll build (1) into the SampleRateConverter class and (3) into the Audio Interface. I'll play with some ideas, but I don't mind making a "pop" on all active channels when something is patched, in order to resynchronize the streams.

AndrewBelt on 22 Dec 2017

👍1

Glad it helps. PR created.

After some more testing, looks like no synchronization will be required. Consider:

#include <stdio.h>
#include <assert.h>
#include "speex/speex_resampler.h"

int main() {
  const int channels = 1;
  const int inRate = 44100;
  const int outRate = 88200;
  const int n = 256;
  float samples[n] {};

  // existing resampler.
  SpeexResamplerState *src1State = NULL;
  {
    int error;
    src1State = speex_resampler_init(channels, inRate, outRate, SPEEX_RESAMPLER_QUALITY_DEFAULT, &error);
    assert(error == RESAMPLER_ERR_SUCCESS);
  }

  // run the existing resampler for a while.
  float src1Out[2 * n];
  unsigned int src1InN, src1OutN;
  for (int i = 0; i < 10; ++i) {
    src1InN = n;
    src1OutN = 2 * n;
    speex_resampler_process_float(src1State, 0, samples, &src1InN, src1Out, &src1OutN);
    assert(src1InN == n);
    assert(src1OutN == 2 * n);
  }

  // new resampler.
  SpeexResamplerState *src2State = NULL;
  {
    int error;
    src2State = speex_resampler_init(channels, inRate, outRate, SPEEX_RESAMPLER_QUALITY_DEFAULT, &error);
    assert(error == RESAMPLER_ERR_SUCCESS);
  }

  // set an impulse.
  samples[0] = 1.0;

  // process samples through each resampler.
  src1InN = n;
  src1OutN = 2 * n;
  speex_resampler_process_float(src1State, 0, samples, &src1InN, src1Out, &src1OutN);
  assert(src1InN == n);
  assert(src1OutN == 2 * n);

  float src2Out[2 * n];
  unsigned int src2InN = n;
  unsigned int src2OutN = 2 * n;
  speex_resampler_process_float(src2State, 0, samples, &src2InN, src2Out, &src2OutN);
  assert(src2InN == n);
  assert(src2OutN == 2 * n);

  // what's the output latency?
  int lag = speex_resampler_get_output_latency(src1State);
  assert(lag == speex_resampler_get_output_latency(src2State));
  printf("Output latency in samples: %d (output at %dhz)\n\n", lag, outRate);

  // expect to find the impulse "lag"-many samples into the ouptut.
  for (int i = 0; i < 2*n; ++i) {
    if (src1Out[i] > 0.1 || src1Out[i] < -0.1) {
      printf("Old SRC out %d: %f\n", i, src1Out[i]);
    }
  }
  printf("\n");
  for (int i = 0; i < 2*n; ++i) {
    if (src2Out[i] > 0.1 || src2Out[i] < -0.1) {
      printf("New SRC out %d: %f\n", i, src2Out[i]);
    }
  }

  speex_resampler_destroy(src1State);
  speex_resampler_destroy(src2State);
  return 0;
}

Which outputs:

Output latency in samples: 64 (output at 88200hz)

Old SRC out 59: 0.110882
Old SRC out 61: -0.202111
Old SRC out 63: 0.633214
Old SRC out 64: 0.940000
Old SRC out 65: 0.633214
Old SRC out 67: -0.202111
Old SRC out 69: 0.110882

New SRC out 59: 0.110882
New SRC out 61: -0.202111
New SRC out 63: 0.633214
New SRC out 64: 0.940000
New SRC out 65: 0.633214
New SRC out 67: -0.202111
New SRC out 69: 0.110882

So parallel inputs to the running and newly-initialized resamplers show up in the output at the same time.

mdemanett on 23 Dec 2017

👍1

Yes, but what if out rate / in rate is not an integer?

AndrewBelt on 23 Dec 2017

Same thing. The lag will be a different value, but it's a constant function of the ratio and quality, independent of the samples that have been fed through.

mdemanett on 23 Dec 2017

This is likely a dumb question, but why does Rack have to do any sample rate conversion? It looks like you can choose your desired sample rate when you open a stream with rtaudio.

briansorahan on 30 Dec 2017

@briansorahan Correct. The audio device sample rate might be different than the internal sample rate, you might have multiple audio interfaces with different sample rates, and modules like Braids or Clouds might operate at a fixed sample rate and need conversion to the engine sample rate.

AndrewBelt on 30 Dec 2017

Hmmm the multiple audio interfaces situation makes sense. And I actually just noticed today the conversion happening in the AudibleInstruments modules. I'm guessing that's necessary because the pichenette code is meant to run on a device where the sample rate is known ahead of time

briansorahan on 30 Dec 2017

Exactly.

AndrewBelt on 30 Dec 2017

Thx @AndrewBelt

briansorahan on 30 Dec 2017

Probably not of much interest, but from time to time I have used very "terrible" SR conversion, and unless you hit it with super high frequencies it can be difficult to hear the difference. Some that I have used in various places: cubic polynomial interpolation (which I guess is a bad low order FIR filter), and 4 pole IIR lowpass filter. I know, I know, there are terrible, but perhaps as a configurable "low quality" option?