transforms module

class transforms.AddWhiteNoise[source]

Bases: Module

Transformation that adds white noise to the audio signal

Example :

>>> x = torch.zeros(16000)
>>> transform = AddWhiteNoise()
>>> x_with_noise = AddWhiteNoise(x)

add_white_noise(audio_tensor, min_snr_db=20, max_snr_db=90, STD_n=0.5)[source]

Adds a random gaussian white noise to the audio_tensor input

Parameters:

audio_tensor (torch.tensor) – 1 dimensional pytorch tensor
min_snr_db (int, optional) – minimum signal to noise ratio in dB. Defaults to 20.
max_snr_db (int, optional) – maximum signal to noise ratio in dB. Defaults to 90.
STD_n (float, optional) – Standard deviation of the gaussian distribution used to generate the noise. Defaults to 0.5.

Returns:

tensor with noise

Return type:

torch.tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class transforms.MfccTransform(sample_rate)[source]

Bases: Module

Transformation that returns the Mel-frequency cepstral coefficients of an audio tensor

Example :

>>> x = torch.zeros(16000)
>>> transform = MfccTransform()
>>> specgram = MfccTransform(x)

We can visualize the generated ceptrum with matplotlib using the following :

>>> fig, axs = plt.subplots(1, 1)
>>> axs.set_title(title or "Mel-frequency cepstrum")
>>> axs.set_ylabel(ylabel)
>>> axs.set_xlabel("frame")
>>> im = axs.imshow(librosa.power_to_db(specgram), origin="lower", aspect="auto")
>>> fig.colorbar(im, ax=axs)
>>> plt.show(block=False)

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mfcc_transform(audio_tensor, sample_rate, n_fft=512, n_mfcc=64, hop_length=10, mel_scale='htk')[source]

class transforms.Scattering[source]

Bases: Module

Wrapper for kymatio’s scattering transform. Returns the scattering coefficients of the input.

For more information about the transform checkout : https://www.kymat.io/

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class transforms.SpecAugment[source]

Bases: Module

Transformation that returns double time-masked and frequency-masked Mel-frequency cepstral coefficients of an audio tensor

Example :

>>> x = torch.zeros(16000)
>>> transform = MfccTransform()
>>> specgram = MfccTransform(x)

We can visualize the modified ceptrum with matplotlib using the following :

>>> fig, axs = plt.subplots(1, 1)
>>> axs.set_title(title or "Mel-frequency cepstrum")
>>> axs.set_ylabel(ylabel)
>>> axs.set_xlabel("frame")
>>> im = axs.imshow(librosa.power_to_db(specgram), origin="lower", aspect="auto")
>>> fig.colorbar(im, ax=axs)
>>> plt.show(block=False)

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

spec_aug(tensor, time_mask=50, freq_mask=5, prob=0.8)[source]