Building a speech recognizer AI/ML model in Python (Part 4 of 6 — Synthesizing tones to generate music)

2 min readFeb 5, 2024

Previously we explored how to generate a monotone, but monotone is not very meaningful as it's just a single frequency.

Let’s use the same principle to synthesize music by stitching different tones together. Similarly to music theory, we will use tones such as A, C, G, and F to create music.

Let’s synthesizing tones to generate music.

Create a new Python file and import the packages.

# Import packages
import numpy as np
import matplotlib.pyplot as plt
from scipy.io.wavfile import write
import json

Let’s define a function which we use to generate a tone based on parameters.

# Synthesize the tone
def tone_synthesizer(freq, duration, amplitude=1.0, sampling_freq=44100):
  # Construct the time axis
  time_axis = np.linspace(int(0), int(duration), int(duration * sampling_freq))

Construct the audio signal using the parameters specified and return it.

# Construct audio signal
signal = amplitude * np.sin(2 * np.pi * freq * time_axis)

return signal.astype(np.int16)

Define main function.

if _name_=='_main_':
  file_tone_single = 'generated_tone_single.wav'
  file_tone_sequence = 'generated_tone_sequence.wav'

We will use a tone mapping that contains mapping from tones, such as A, C and G to the corresponding frequencies.

# Source: 
mapping_file = 'tone_mapping.json'

# Load the tone to frequency map
with open(mapping_file, 'r') as f:
  tone_map = json.loads(f.read())

Let’s generate the F tone.

tone_name = 'F'
duration = 3
amplitude = 12000
sampling_freq = 44100

Extract the corresponding frequency and generate tone using synthesizer function.

# Extract
tone_freq = tone_map[tone_name]

# Generate tone
synthesized_tone = tone_synthesizer(tone_freq, duration, amplitude, sampling_freq)

# Writing the generated tone
write(file_tone_single, sampling_freq, synthesized_tone)

In-order to make it sound like music, we will create a tone sequence.

# Tone sequence
tone_sequence = [('G', 0.4), ('D', 0.5), ('F', 0.3), ('C', 0.6), ('A', 0.4)]

Construct audio signal based on tone sequence.

# Contruct audio signal
signal = np.array([])
for item in tone_sequence:
  tone_name = item[0]

For each tone, extract the corresponding frequency.

# Extract the corresponding frequency of the tone
freq = tone_map[tone_name]
duration = item[1]

# Synthesize tone
synthesized_tone = tone_synthesizer(freq, duration, amplitude, sampling_freq)

Append the extraction to the main output signal and save.

# Append signal
signal = np.append(signal, synthesized_tone, axis=0)

# Save audio file
write(file_tone_sequence,l sampling_freq, signal)

Once the final block of code has been run, you should be able to play the audio. The example is a rudimentary example of speech and tone synthesizer used in advanced audio sampling tools.

In the next part (5) we will explore methods to extract speech features.

Building a speech recognizer AI/ML model in Python (Part 4 of 6 — Synthesizing tones to generate music)

Written by Burhan Amjad

No responses yet