Librosa: hz_to_midi returns negatives

Created on 6 Sep 2018  路  3Comments  路  Source: librosa/librosa

Description

I am not that familiar with all the mathematics behind all these signals formulation but I would like to convert a given audio time series or audio buffer into midi

I am encountering this problem of having negative values and large numbers beyond 0-127 as the result and thus unable to create a physical midi file to listen to test or my procedures are just plainly wrong in the first place?

Would be great if anybody would be able to explain these as well:

  1. Handling of overlapping frames for midi, could setting hop_length=fft_length as my parameters be a solution?
  2. Suppose I manage to get my array of midi notes, how do I go about obtaining the duration to play the specific note? How much time is represented between the indices of the returned midi note array?

Steps/Code to Reproduce

import librosa
import numpy as np

signal, sampleRate = librosa.load(path="sound.wav", sr=None, mono=True, dtype=np.float32)
melspec = librosa.feature.melspectrogram(y=signal, sr=sampleRate, n_fft=1024, hop_length=1024, n_mels=128)
freq = librosa.mel_to_hz(melspec)
midi = librosa.hz_to_midi(freq)
print(midi)

Expected Results

[[int]] with elements' values between 0 to 127

Actual Results

[[-287 -130 -105 ... -118 -134 -117]
 [-292 -144 -110 ... -117 -131 -110]
 [-301 -144 -127 ... -111 -122  -98]
 ...
 [-329 -327 -332 ... -334 -322 -272]
 [-328 -330 -333 ... -325 -348 -271]
 [-332 -340 -323 ... -325 -332 -273]]

Versions

question

Most helpful comment

So the general problem that you're describing (audio -> midi or symbolic score) is incredibly difficult and an active research area.

The more specific problem you describe later (vocal recording to pitch) is easier, but still takes considerable modeling and parameter tuning to achieve accurate results. This kind of thing isn't currently implemented in librosa, though we provide the building blocks to do it. There's an open issue #527 to implement a simple pitch tracking algorithm, which would provide fundamental frequency estimates over time for a given recording. From that, you could convert pitch (hz) to midi, and then round that to integer values to get quantized notes, if that's what you're ultimately after.

If you just need something that works out of the box already, you might look into melodia or deep salience (with the singlef0 option).

All 3 comments

I think this stems from a misunderstanding of what these functions do.

mel_to_hz converts mel bin indices to their corresponding frequencies. It does not operate on the values contained in a mel spectrogram array.

@bmcfee Do you have any suggestion on how I should be approaching this problem?
Essentially I am trying to analyze the accuracy of a vocal recording with a musical score (midi) in terms of pitch and tone etc.

So the general problem that you're describing (audio -> midi or symbolic score) is incredibly difficult and an active research area.

The more specific problem you describe later (vocal recording to pitch) is easier, but still takes considerable modeling and parameter tuning to achieve accurate results. This kind of thing isn't currently implemented in librosa, though we provide the building blocks to do it. There's an open issue #527 to implement a simple pitch tracking algorithm, which would provide fundamental frequency estimates over time for a given recording. From that, you could convert pitch (hz) to midi, and then round that to integer values to get quantized notes, if that's what you're ultimately after.

If you just need something that works out of the box already, you might look into melodia or deep salience (with the singlef0 option).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mvollrath picture mvollrath  路  4Comments

edne picture edne  路  3Comments

danmackinlay picture danmackinlay  路  3Comments

mmcauliffe picture mmcauliffe  路  3Comments

juanbraga picture juanbraga  路  3Comments