Text-to-Speech with gTTS
Text-to-Speech with gTTS
Generate voice prompts programmatically using Google's Text-to-Speech (gTTS) Python library instead of recording them manually. This is useful for dynamic announcements, multi-language prompts, or rapidly prototyping IVR menus.
Requirements
- Python 3 with
pip gttspackage:pip install gttssoxfor format conversion to Asterisk-native formats- Internet access (gTTS calls Google's TTS API)
Basic: Generate an MP3 Prompt
from gtts import gTTS
text = "Thank you for calling. Please hold while we connect you."
language = "en"
speech = gTTS(text=text, lang=language, slow=False)
speech.save("thankyou.mp3")
Convert to Asterisk-Compatible Format
gTTS outputs MP3, but Asterisk prefers 8kHz mono WAV (ulaw/alaw) or GSM. Use sox to convert:
# Convert MP3 to 8kHz mono ulaw WAV
sox thankyou.mp3 -r 8000 -c 1 /var/lib/asterisk/sounds/custom/thankyou.wav
# Or convert to slin (signed linear 16-bit) at 8kHz
sox thankyou.mp3 -r 8000 -c 1 -t raw /var/lib/asterisk/sounds/custom/thankyou.sln
Batch: Generate from a Text File
from gtts import gTTS
with open("prompts.txt", "r") as f:
text = f.read()
speech = gTTS(text=text, lang="en", slow=False)
speech.save("announcement.mp3")
Advanced: Resample with librosa
For finer control over audio quality and sample rate, use librosa and soundfile:
from gtts import gTTS
import librosa
import soundfile as sf
text = "Press 1 for sales. Press 2 for support."
speech = gTTS(text=text, lang="en", slow=False)
speech.save("menu.mp3")
# Load and resample to 8kHz for Asterisk
y, sr = librosa.load("menu.mp3")
resampled = librosa.resample(y, orig_sr=sr, target_sr=8000)
sf.write("menu.wav", resampled, 8000)
Using the Prompts in Dialplan
[ivr-main]
exten => s,1,Answer()
same => n,Playback(custom/thankyou)
same => n,Background(custom/menu)
same => n,WaitExten(5)
How it works
- gTTS: Sends text to Google's Translate TTS API and returns an MP3 audio stream. The
slow=Falseparameter uses natural speech speed. Thelangparameter accepts BCP-47 language codes (en,es,fr,de, etc.). - Format conversion: Asterisk's
Playback()andBackground()look for files in/var/lib/asterisk/sounds/with the codec extension stripped. A file atcustom/thankyou.wavis referenced ascustom/thankyouin dialplan. - Sample rate: Asterisk's native telephony sample rate is 8000 Hz. Files at higher rates work but waste memory and CPU. Downsampling with
soxorlibrosais recommended. - Sound search path: Asterisk searches
/var/lib/asterisk/sounds/by default. Place custom files in acustom/subdirectory to keep them separate from built-in sounds and avoid overwriting them during upgrades.
Tips
- For offline TTS (no internet required), consider
pico2wave(SVOX) orespeak-ngwhich run locally. - Create a shell script that takes text as an argument and outputs an Asterisk-ready WAV:
gtts-cli "Your text" | sox - -r 8000 -c 1 output.wav. gtts-cliis a command-line tool installed with gTTS:gtts-cli "Hello world" --output hello.mp3.- For real-time TTS during a call, use AGI or ARI to generate audio on-the-fly, though latency from the Google API may be noticeable.
- gTTS is rate-limited by Google. For high-volume prompt generation, use Google Cloud Text-to-Speech (paid) or a local engine.
- Install additional dependencies for librosa:
pip install librosa soundfile.
User Notes
No notes yet. Be the first to contribute a tip or example.
Contribute a note
Share a tip, gotcha, or practical example. Keep it under 2000 characters. No questions (use the Asterisk community forums for support). Wrap code in backticks.