Audio Module (stx.audio)
Text-to-speech synthesis and audio file playback. Supports multiple TTS backends with automatic fallback.
Quick Start
import scitex as stx
# Text-to-speech
stx.audio.speak("Analysis complete. 42 significant results found.")
# Play an audio file
stx.audio.play("notification.wav")
Key Functions
speak(text, backend=None, **kwargs)
Convert text to speech and play it through the default audio output. If no backend is specified, the first available backend is used.
# Use default backend
stx.audio.speak("Hello from SciTeX")
# Specify a backend
stx.audio.speak("Hello", backend="espeak")
play(path)
Play an audio file (WAV, MP3, etc.).
stx.audio.play("alert.wav")
stx.audio.play("recording.mp3")
list_backends()
List available TTS backends on the current system.
backends = stx.audio.list_backends()
# e.g., ['espeak', 'festival', 'pyttsx3']
Use Cases
Audible notifications are useful for long-running experiments:
import scitex as stx
@stx.session
def main(CONFIG=stx.INJECTED):
result = train_model() # Takes hours
stx.audio.speak(f"Training done. Accuracy: {result.accuracy:.1%}")
return 0
API Reference
SciTeX Audio - Text-to-Speech with Multiple Backends
- Backends (fallback order):
elevenlabs: ElevenLabs (paid, high quality, speed=1.2)
luxtts: LuxTTS (open-source, offline, voice-cloning, speed=2.0)
gtts: Google TTS (free, requires internet, speed=1.5)
pyttsx3: System TTS (offline, free, uses espeak/SAPI5)
- Usage:
from scitex_audio import speak speak(“Hello, world!”)
from scitex_audio import get_tts, LuxTTS tts = get_tts(“luxtts”) tts.speak(“Hello!”)
- scitex.audio.speak(text, backend=None, voice=None, play=True, output_path=None, fallback=None, rate=None, speed=None, mode=None, **kwargs)[source]
Convert text to speech with smart local/remote switching.
- Modes:
local: Always use local TTS backends (fails if audio unavailable)
remote: Always forward to relay server
auto: Smart routing - prefers relay if local audio unavailable
- Smart Routing (auto mode):
Checks if local audio sink is available (not SUSPENDED)
If local unavailable and relay configured, uses relay
If both unavailable, returns error with clear message
Fallback order (local, only when backend is None): elevenlabs -> luxtts -> gtts -> pyttsx3
- Parameters:
text (
str) – Text to speak.backend (
Optional[str]) – TTS backend (‘elevenlabs’, ‘luxtts’, ‘gtts’, ‘pyttsx3’). Auto-selects with fallback if None.play (
bool) – Whether to play the audio.fallback (
Optional[bool]) – If None (default), True when backend is None, False when backend is explicitly specified — i.e. an explicit backend request fails loud rather than silently falling back. Pass True/False to override.rate (
Optional[int]) – Speech rate in words per minute (pyttsx3 only, default 150).speed (
Optional[float]) – Speed multiplier for gtts (1.0=normal, >1.0=faster, <1.0=slower).mode (
Optional[str]) – Override mode (‘local’, ‘remote’, ‘auto’). Uses env if None.**kwargs – Additional backend options.
- Returns:
success, played, play_requested, backend, path (if saved), mode.
- Return type:
Dict with
- Environment Variables:
SCITEX_AUDIO_MODE: Default mode (‘local’, ‘remote’, ‘auto’) SCITEX_AUDIO_RELAY_URL: Relay server URL for remote mode
- scitex.audio.generate_bytes(text, backend=None, voice=None, **kwargs)[source]
Generate TTS audio as raw bytes without playing.
- Return type:
- scitex.audio.stop_speech()[source]
Stop any currently playing speech by killing espeak processes.
- Return type:
- scitex.audio.get_tts(backend=None, **kwargs)[source]
Get a TTS instance for the specified backend.
- Return type:
BaseTTS
- scitex.audio.announce_context(include_full_path=False, speak_aloud=True, branch_resolver=None, speak_fn=None)[source]
Announce the current working directory and git branch.
Builds an orientation sentence (e.g.
"Working in scitex-audio, on branch develop") and, by default, speaks it aloud. Useful when starting work in a new session.- Parameters:
include_full_path (bool) – Include the absolute path instead of just the directory name.
speak_aloud (bool) – Speak the announcement (default True). When False, only the context dict is returned.
branch_resolver (callable, optional) – Injectable callable
(cwd: str) -> str | Nonereturning the git branch name (testing seam). Defaults to a realgit rev-parsesubprocess.speak_fn (callable, optional) – Injectable speak function (testing seam). Defaults to
speak().
- Returns:
{"directory": str, "directory_name": str, "git_branch": str | None, "announced_text": str, "spoke": bool}.- Return type:
- class scitex.audio.TTS(api_key=None, voice_name=None, voice_id=None, client=None, client_factory=None, **kwargs)[source]
Bases:
objectText-to-Speech using ElevenLabs API.
Examples
# Basic usage tts = TTS() tts.speak(“Hello, world!”)
# With custom voice tts = TTS(voice_name=”Adam”) tts.speak(“Processing complete”)
# Save to file without playing tts.speak(“Test”, output_path=”/tmp/test.mp3”, play=False)
- __init__(api_key=None, voice_name=None, voice_id=None, client=None, client_factory=None, **kwargs)[source]
Initialize TTS.
- Parameters:
api_key (
Optional[str]) – ElevenLabs API key. Defaults to ELEVENLABS_API_KEY env var.voice_name (
Optional[str]) – Voice name (e.g., “Adam”, “Sarah”, “George” — free-tier).voice_id (
Optional[str]) – Direct voice ID (overrides voice_name).client – Optional pre-built client (testing). When given, the lazy-load is skipped.
client_factory – Optional callable
(api_key) -> clientused by the lazyclientproperty instead of the real ElevenLabs SDK (testing). Lets a test exercise the import-error path without uninstalling the dependency.**kwargs – Additional config options (stability, speed, etc.)
- property client
Lazy-load ElevenLabs client.
- speak(text, output_path=None, play=True, voice_name=None, voice_id=None)[source]
Convert text to speech and optionally play it.
- Parameters:
- Return type:
- Returns:
Path to the generated audio file, or None if only played.
- class scitex.audio.GoogleTTS(lang='en', slow=False, speed=1.5, gtts_factory=None, **kwargs)[source]
Bases:
BaseTTSGoogle Text-to-Speech backend using gTTS.
Free to use, requires internet connection. Good quality voices with multi-language support. Supports speed control via pydub (requires ffmpeg).
Install: pip install gTTS pydub
- class scitex.audio.ElevenLabsTTS(api_key=None, voice='adam', model_id='eleven_multilingual_v2', stability=0.5, similarity_boost=0.75, speed=1.0, client=None, **kwargs)[source]
Bases:
BaseTTSElevenLabs TTS backend.
High-quality voices but requires API key and has usage costs.
- Environment:
ELEVENLABS_API_KEY: Your ElevenLabs API key
- property client
Lazy-load ElevenLabs client.
- class scitex.audio.SystemTTS(rate=150, volume=1.0, voice=None, engine=None, **kwargs)[source]
Bases:
BaseTTSSystem TTS backend using pyttsx3.
Works offline using system’s built-in TTS engine. Quality varies by platform and available voices.
- Platforms:
Linux: espeak/espeak-ng
Windows: SAPI5
macOS: NSSpeechSynthesizer
- property engine
Lazy-load pyttsx3 engine.
- class scitex.audio.LuxTTS(device=None, model_id='YatharthS/LuxTTS', reference_audio=None, num_steps=4, speed=2.0, rms=0.01, t_shift=0.9, return_smooth=False, ref_duration=5.0, trim_start=None, **kwargs)[source]
Bases:
BaseTTSLuxTTS backend - open-source voice-cloning TTS.
High-quality 48kHz output. Near-realtime on CPU, 150x+ on GPU. Requires a reference audio file for voice cloning.
Install: pip install git+https://github.com/ysharma3501/LuxTTS.git
- scitex.audio.check_local_audio_available()[source]
Check if local audio playback is available.
Checks PulseAudio sink state to determine if audio can actually be heard. On NAS or headless servers, the sink is typically SUSPENDED.
In WSL environments, also checks for Windows playback fallback via PowerShell.
- Return type:
- Returns:
dict with keys: - available: bool - True if local audio output is likely to work - state: str - ‘RUNNING’, ‘IDLE’, ‘SUSPENDED’, ‘NO_SINK’, etc. - reason: str - Human-readable explanation - fallback: str (optional) - Fallback method if primary unavailable
- scitex.audio.generate_env_template(include_sensitive=True, include_defaults=True)
Generate a template .src file with all environment variables.
- scitex.audio.transcribe(audio_path, language='ja', model='tiny', whisper_cli=None, model_path=None)[source]
Transcribe audio file to text using whisper.cpp.
- Parameters:
- Return type:
- Returns:
Dict with keys: success, text, segments, language, model, audio_path.