Building a speech recognizer AI/ML model in Python (Part 2 of 6 — Transforming audio signals to frequency domain)

2 min readJan 31, 2024

In the last part we visualized audio signals, in this part let's explore the underlying frequency components.

We need to extract meaningful information from audio signal. Audio signals are composed of sine waves of varying frequencies, phases, and amplitudes. If we dissect the frequency component, we can identify a lot of characteristics within audio signals, which is characterized by its distribution in frequency spectrum.

In order to convert a time domain signal into frequency domain, we would need a mathematical function called Fourier Transform. a Fourier transform of a signal shows the individual frequency components of a signal. If you need a refresher, please see: https://www.thefouriertransform.com

Let’s transform audio signal to the frequency domain:
Create a new Python file and import the following packages.

# Importing packages
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile

Just like in previous exercise, we would need to read the audio signal using the wavefile.read method.

# Read audio file
sampling_freq, signal = wavfile.read('audio.wav')

Normalize the audio signal value so the audio intensity can be increased without distorting.

# Normalize the audio signal value
signal = signal / np.power(2, 15)

Extract the length and half-length of the audio signal and apply Fourier transform to the signal.

# Extract the length
len_signal = len(signal)

#Extract half-length
len_half = np.ceil((len_signal + 1) /2.0).astype(int)

# Apply Fourier transform to the signal
freq_signal = np.fft.fft(signal)

Normalize the frequency domain signal and take the square.

# Normalization
freq_signal = abs(freq_signal[0:len_half]) / len_signal

#Take the square
freq_signal **=2

Adjust the Fourier-transformed signal of even and odd cases.

# Extract the length of the frequency transformed signal
len_fts = len(freq_signal)

# Adjust the signal for even and odd cases
if len_signal % 2:
  freq_signal[1:len_fts] *= 2
else:
  freq_signal[1:len_fts-1] *=2

Extract the power signal in decibals (dB)

# Extract power value in dB
signal_power = 10 * np.log10(freq_signal)

Build the X axis, which is frequency measured in KHz.

# Build X axis
x_axis = np.arange(0, len_half, 1) * (sampling_freq / len_signal) / 1000.0

Plot the figure

# Plot
plt.figure()
plt.plot(x_axis, signal_power, color = 'black')
plt.xlabel('Freq (KHz)')
plt.ylabel('Signal power (dB)')
plt.show()

The expected output should show you how powerful the signal is across the frequency spectrum, as expected the power of signal goes down in higher frequencies.

In the next part (3) we will explore how to generate audio signals.

Building a speech recognizer AI/ML model in Python (Part 2 of 6 — Transforming audio signals to frequency domain)

Written by Burhan Amjad

No responses yet