Spectral Subtraction for Speech Enhancement

Resource Overview

Load speech file, generate random noise, synthesize noisy speech signal, and define parameter settings. Step 2: Frame segmentation with 50% overlap. Step 3: Apply Hamming window and perform Fourier transform to obtain power spectrum and phase spectrum. Step 4: Execute magnitude spectral subtraction and use the noisy speech phase for signal resynthesis via inverse Fourier transform. Step 5: Remove Hamming window to obtain enhanced speech signal. Step 6: Calculate SNR before and after enhancement.

Detailed Documentation

In this implementation, we execute the following systematic procedure for speech signal processing and quality enhancement:

Step 1: Speech file loading. We initially read the input speech file using audio processing functions (e.g., audioread() in MATLAB) to prepare for subsequent processing stages.

Step 2: Random noise generation. We create Gaussian white noise through random number generation algorithms, which is mixed with the clean speech to simulate real-world acoustic environments.

Step 3: Noisy speech synthesis. The generated noise is added to the original speech signal using vector addition operations, creating the contaminated speech signal for enhancement processing.

Step 4: Parameter definition. Critical parameters including frame length, overlap ratio, and spectral subtraction factors are defined to optimize the enhancement algorithm's performance.

Step 5: Frame segmentation. The noisy speech signal is divided into overlapping frames (typically 20-40ms duration) using a 50% overlap strategy to maintain temporal continuity while minimizing spectral leakage.

Step 6: Hamming window application and Fourier transform. Each frame undergoes Hamming windowing (using hamming() function) to reduce edge effects, followed by FFT computation to derive the complex spectrum, from which power spectrum and phase spectrum are extracted.

Step 7: Magnitude spectral subtraction. The noise power spectrum estimate is subtracted from the noisy speech power spectrum using minimum statistics or average noise power calculations, with proper flooring to prevent negative values.

Step 8: Signal resynthesis via inverse Fourier transform. The enhanced magnitude spectrum combined with the original phase spectrum undergoes IFFT transformation to reconstruct the time-domain signal for each frame.

Step 9: Hamming window removal. The windowing effect is compensated during the overlap-add reconstruction process, yielding the final enhanced speech signal through proper frame concatenation.

Step 10: SNR calculation. Signal-to-Noise Ratio metrics are computed using power comparison algorithms between original noisy speech and enhanced speech to quantitatively evaluate the improvement in speech quality.

This comprehensive methodology effectively processes speech signals through spectral domain manipulations, significantly enhancing audio quality for clearer and more intelligible speech communication systems.