Speech Signal Processing and Voice Conversion

Resource Overview

This workflow demonstrates comprehensive speech signal processing including reading audio files using MATLAB's wavread function, frame segmentation, windowing, voiced/unvoiced detection, fundamental frequency extraction, LPC parameter analysis using provided lpcfit.m, DTW alignment of voiced segments with pathita2.m, LPC-to-LSP conversion using lpcar2ls.m, and final transformation mapping using MATLAB's newrbe function.

Detailed Documentation

In this document, we can extend the text while preserving key concepts through the following implementation steps: 1. First, we utilize MATLAB's wavread function to read speech signal files. This function returns both the audio data sample values and the corresponding sampling frequency, which is essential for subsequent processing. 2. Next, we perform frame segmentation and windowing (typically using Hamming or Hanning windows) on the speech signal, followed by voiced/unvoiced detection and fundamental frequency (pitch) extraction. This segment involves basic digital signal processing techniques where you can implement custom algorithms for frame blocking (usually 20-30ms frames with 50% overlap), window application, and pitch detection using methods like autocorrelation or cepstrum analysis. 3. For reference materials, please search relevant databases and journal websites through library resources to find supporting documentation and research papers. 4. Subsequently, we extract Linear Predictive Coding (LPC) parameters using the provided lpcfit.m program. This function typically implements the autocorrelation method to compute LPC coefficients. We then employ the provided pathita2.m program to perform Dynamic Time Warping (DTW) alignment on the LPC coefficients of voiced segments between source and target speech signals, which normalizes temporal variations for voice conversion applications. 5. The normalized LPC coefficients are converted to Line Spectral Pair (LSP) parameters using the provided lpcar2ls.m program, which implements the transformation from LPC to LSP representation using polynomial rooting techniques. Finally, we perform transformation mapping using MATLAB's newrbe function, which creates an exact radial basis function network for pattern transformation between source and target speech parameters.