One perspective on time-domain processing of the previous section is that the hidden layers of a multilayer network provide a transformed representation of the signal and noise that facilitates their separation. In transform-domain approaches, the new representation is explicitly provided by preprocessing in an attempt to reduce the complexity of the neural network's task. In addition, perceptual or ASR constraints are more easily incorporated.
The recent traditional speech enhancement literature has been largely dominated by transform-based methods, and by the DFT-based spectral subtraction method [19] in particular. Because of its importance, we will start with a brief review of the approach. This will help to motivate the use of neural networks with transforms that are themselves nonlinear.