matchbox package

Submodules

matchbox.ConvASRDecoder module

class matchbox.ConvASRDecoder.ConvASRDecoderClassification(feat_in: int, num_classes: int, init_mode: str | None = 'xavier_uniform', return_logits: bool = True, pooling_type='avg')[source]

Bases: Module

forward(encoder_output)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

matchbox.ConvASREncoder module

class matchbox.ConvASREncoder.ConvASREncoder(activation: str = 'relu', feat_in: int = 64, normalization_mode: str = 'batch', residual_mode: str = 'add', norm_groups: int = -1, conv_mask: bool = True, frame_splicing: int = 1, init_mode: str | None = 'xavier_uniform')[source]

Bases: Module

forward(audio_signal, length=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

matchbox.ConvBlock module

class matchbox.ConvBlock.ConvBlock(inplanes, planes, repeat=3, kernel_size=11, kernel_size_factor=1, stride=1, dilation=1, padding='same', dropout=0.2, activation=None, residual=True, groups=1, separable=False, heads=-1, normalization='batch', norm_groups=1, residual_mode='add', residual_panes=[], conv_mask=False, stride_last=False)

Bases: Module

Convolution Block torch Module. This is the main building block of the network.

Parameters:

inplanes – Number of input channels.
planes – Number of output channels.
repe_at – Number of repeated sub-blocks (R) for this block.
kernelsize – Convolution kernel size across all repeated sub-blocks.
kernel_size_factor – Floating point scale value that is multiplied with kernel size, then rounded down to nearest odd integer to compose the kernel size. Defaults to 1.0.
stride – Stride of the convolutional layers.
dilation – Integer which defined dilation factor of kernel. Note that when dilation > 1, stride must be equal to 1.
padding – String representing type of padding. Currently only supports “same” padding, which symmetrically pads the input tensor with zeros.
dropout – Floating point value, determins percentage of output that is zeroed out.
activation – String representing activation functions. Valid activation functions are : {“hardtanh”: nn.Hardtanh, “relu”: nn.ReLU, “selu”: nn.SELU, “swish”: Swish}. Defaults to “relu”.
residual – Bool that determined whether a residual branch should be added or not. All residual branches are constructed using a pointwise convolution kernel, that may or may not perform strided convolution depending on the parameter residual_mode.
groups – Number of groups for Grouped Convolutions. Defaults to 1.
separable – Bool flag that describes whether Time-Channel depthwise separable convolution should be constructed, or ordinary convolution should be constructed.
heads – Number of “heads” for the masked convolution. Defaults to -1, which disables it.
normalization – String that represents type of normalization performed. Can be one of “batch”, “group”, “instance” or “layer” to compute BatchNorm1D, GroupNorm1D, InstanceNorm or LayerNorm (which are special cases of GroupNorm1D).
norm_groups – Number of groups used for GroupNorm (if normalization == “group”).
residual_mode – String argument which describes whether the residual branch should be simply added (“add”) or should first stride, then add (“stride_add”). Required when performing stride on parallel branch as well as utilizing residual add.
residual_panes – Number of residual panes, used for Jasper-DR models. Please refer to the paper.
conv_mask – Bool flag which determines whether to utilize masked convolutions or not. In general, it should be set to True.
stride_last – Bool flag that determines whether all repeated blocks should stride at once, (stride of S^R when this flag is False) or just the last repeated block should stride (stride of S when this flag is True).

forward(input_: Tuple[List[Tensor], Tensor | None])

Forward pass of the module.

Parameters:: input – The input is a tuple of two values - the preprocessed audio signal as well as the lengths of the audio signal. The audio signal is padded to the shape [B, D, T] and the lengths are a torch vector of length B.
Returns:: The output of the block after processing the input through repeat number of sub-blocks, as well as the lengths of the encoded audio after padding/striding.

class matchbox.ConvBlock.GroupShuffle(groups, channels)

Bases: Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

matchbox.ConvBlock.compute_new_kernel_size(kernel_size, kernel_width)

matchbox.ConvBlock.get_same_padding(kernel_size, stride, dilation) → int

matchbox.ConvBlock.init_weights(m, mode: str | None = 'xavier_uniform')

matchbox.ConvBlock.tds_normal_(tensor, mode='fan_in')

Normal Initialization from the paper Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

Parameters:

tensor – an n-dimensional torch.Tensor
mode – either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

matchbox.ConvBlock.tds_uniform_(tensor, mode='fan_in')

Uniform Initialization from the paper Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

Parameters:

tensor – an n-dimensional torch.Tensor
mode – either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

matchbox.MaskedConv1d module

class matchbox.MaskedConv1d.MaskedConv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, heads=-1, bias=False, use_mask=True)[source]

Bases: Module

forward(x, lens)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_seq_len(lens)[source]

update_masked_length(x, lens)[source]