IndusLabs Python SDK

Official Python SDK for IndusLabs Voice API - providing seamless Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities with both synchronous and asynchronous support.

Need an API Key? If you don't have an API key yet, you can create one here: https://playground.induslabs.io/register

Installation

Install the SDK using pip. Requires Python 3.7 or higher.

pip install induslabs

Quick Start

Initialize the client with your API key and start making requests immediately.

from induslabs import Client

# Initialize with API key
client = Client(api_key="your_api_key_here")

# Or use environment variable
# export INDUSLABS_API_KEY="your_api_key_here"
client = Client()

# Text-to-Speech
response = client.tts.speak(
    text="Hello, this is a test",
    voice="Indus-hi-maya"
)
response.save("output.wav")

# Speech-to-Text
result = client.stt.transcribe(
    file="audio.wav", 
    model="indus-stt-v1",
    streaming=False,
    noise_cancellation=True  # Enable noise suppression
)
print(result.text)
if result.metrics:
    print(f"RTF: {result.metrics.rtf:.3f}")

Features

Sync & Async APIs

Use synchronous methods for simple scripts or async for better performance

Streaming Support

Start playing audio as soon as first bytes arrive

Multiple Formats

Support for WAV, MP3, and PCM audio formats

Concurrent Requests

Built-in support for parallel request processing

Type Hints

Full type annotations for better IDE support

Error Handling

Comprehensive error messages and exceptions

Text-to-Speech: Basic Usage

Convert text to speech with simple method calls. The SDK handles all API communication and response parsing.

from induslabs import Client

client = Client(api_key="your_api_key")

# Simple synthesis
response = client.tts.speak(
    text="Hello, this is a test",
    voice="Indus-hi-maya"
)

# Save to file
response.save("output.wav")

# Access metadata
print(f"Sample Rate: {response.sample_rate}Hz")
print(f"Channels: {response.channels}")
print(f"Format: {response.format}")
print(f"Request ID: {response.request_id}")

Streaming Audio

Enable streaming to receive audio chunks as they're generated, reducing latency for real-time applications.

"""
Streaming Text-to-Speech Example

This example demonstrates how to stream audio from the IndusLabs TTS API
and play it in real-time while simultaneously saving to a file.

Requirements:
    pip install induslabs pyaudio

Note: PyAudio may require additional system dependencies:
    - Ubuntu/Debian: sudo apt-get install portaudio19-dev python3-pyaudio
    - macOS: brew install portaudio
    - Windows: PyAudio wheels available on PyPI
"""

import queue
import threading
import time
import pyaudio
from induslabs import Client


class StreamingTTSPlayer:
    """Handles real-time streaming playback of TTS audio with buffering"""
    
    def __init__(self, sample_rate=24000, channels=1, chunk_size=4096):
        self.sample_rate = sample_rate
        self.channels = channels
        self.chunk_size = chunk_size
        self.audio_queue = queue.Queue()
        self.streaming_complete = False
        self.playing = False
        
        self.p = pyaudio.PyAudio()
        self.stream = None
    
    def _stream_audio(self, response, save_path=None):
        """Receives audio chunks from API and queues them for playback"""
        file_handle = open(save_path, "wb") if save_path else None
        
        try:
            for chunk in response.iter_bytes(chunk_size=self.chunk_size):
                self.audio_queue.put(chunk)
                if file_handle:
                    file_handle.write(chunk)
        finally:
            self.streaming_complete = True
            if file_handle:
                file_handle.close()
    
    def _play_audio(self):
        """Plays audio chunks from the queue"""
        while self.playing:
            try:
                chunk = self.audio_queue.get(timeout=0.05)
                if chunk is None:
                    break
                self.stream.write(chunk)
            except queue.Empty:
                if self.streaming_complete:
                    break
    
    def play(self, response, save_path=None, prebuffer_seconds=1.0):
        """
        Stream and play TTS audio in real-time
        
        Args:
            response: Streaming response from client.tts.speak()
            save_path: Optional path to save audio file
            prebuffer_seconds: Seconds of audio to buffer before playback starts
        """
        # Open audio output stream
        self.stream = self.p.open(
            format=pyaudio.paInt16,
            channels=self.channels,
            rate=self.sample_rate,
            output=True,
            frames_per_buffer=self.chunk_size
        )
        
        self.playing = True
        self.streaming_complete = False
        
        # Start streaming thread
        stream_thread = threading.Thread(
            target=self._stream_audio,
            args=(response, save_path),
            daemon=True
        )
        stream_thread.start()
        
        # Wait for initial buffer
        chunks_needed = int((self.sample_rate * self.channels * 2 / self.chunk_size) * prebuffer_seconds)
        print(f"Buffering {prebuffer_seconds}s of audio...")
        
        while self.audio_queue.qsize() < chunks_needed:
            if self.streaming_complete:
                break
            time.sleep(0.1)
        
        print("Playing audio...
")
        
        # Start playback thread
        play_thread = threading.Thread(target=self._play_audio, daemon=True)
        play_thread.start()
        
        # Wait for completion
        stream_thread.join()
        play_thread.join()
        
        # Cleanup
        self.stream.stop_stream()
        self.stream.close()
    
    def close(self):
        """Release audio resources"""
        self.p.terminate()


def main():
    # Initialize the client
    client = Client()  # Uses INDUSLABS_API_KEY environment variable
    
    # Text to convert to speech
    text = """
    Artificial intelligence is transforming the way we live and work. 
    From self-driving cars to personalized healthcare, AI is revolutionizing 
    various industries. Machine learning algorithms are becoming more advanced, 
    enabling systems to recognize patterns and make predictions with incredible accuracy.
    """
    
    print("=" * 60)
    print("IndusLabs Streaming TTS Example")
    print("=" * 60)
    
    # Create streaming response
    response = client.tts.speak(
        text=text,
        voice="Indus-hi-maya",
        stream=True  # Enable streaming
    )
    
    # Create player and play audio
    player = StreamingTTSPlayer()
    
    try:
        # Play audio in real-time and save to file
        player.play(response, save_path="output.wav", prebuffer_seconds=1.0)
        print("Playback complete!")
        print("Audio saved to: output.wav")
    finally:
        player.close()


if __name__ == "__main__":
    main()

Working with File Objects

Process audio in memory without saving to disk. Useful for temporary processing or immediate playback.

import wave
from induslabs import Client

# Initialize Client
client = Client(api_key="your_api_key")

response = client.tts.speak(
    text="In-memory audio",
    voice="Indus-hi-maya"
)

# Get as file-like object (BytesIO)
audio_file = response.to_file_object()

# Get raw bytes
audio_bytes = response.get_audio_data()

# Pass to other libraries (e.g., standard wave module)
with wave.open(audio_file, 'rb') as wf:
    frames = wf.readframes(wf.getnframes())
    print(f"Read {len(frames)} frames from memory")

Audio Formats

Choose between WAV, MP3, or PCM formats based on your needs.

from induslabs import Client

client = Client(api_key="your_api_key")

# WAV format (default, best quality)
wav_response = client.tts.speak(
    text="WAV format audio",
    voice="Indus-hi-maya",
    output_format="wav"
)

# MP3 format (smaller file size)
mp3_response = client.tts.speak(
    text="MP3 format audio",
    voice="Indus-hi-maya",
    output_format="mp3"
)

# PCM format (raw audio data)
pcm_response = client.tts.speak(
    text="PCM format audio",
    voice="Indus-hi-maya",
    output_format="pcm"
)

Format Details

WAV: 24kHz sample rate, 16-bit, mono - Best for quality
MP3: Compressed format - Best for smaller file sizes
PCM: Raw audio data - Best for direct processing

Speech-to-Text: Basic Usage

Transcribe audio files using the unified transcribe method. By default, this uses the indus-stt-v1 model in non-streaming mode.

from induslabs import Client

client = Client(api_key="your_api_key")

# Transcribe audio file (Indus-STT-V1 model, Non-streaming)
result = client.stt.transcribe(
    file="audio.wav",
    model="indus-stt-v1",
    streaming=False,
    noise_cancellation=True  # Enable noise suppression for better quality
)

# Access transcription
print(f"Transcription: {result.text}")

# Access detailed metrics
print(f"Request ID: {result.request_id}")

if result.metrics:
    print(f"Audio Duration: {result.metrics.buffer_duration:.2f}s")
    print(f"Processing Time: {result.metrics.transcription_time:.2f}s")
    print(f"Real-time Factor (RTF): {result.metrics.rtf:.3f}")

Real-time Streaming

To enable streaming, you must set streaming=True and use the indus-stt-hi-en model. You can then provide an on_segment callback to receive partial results.

from induslabs import Client, STTSegment

client = Client()

# 1. Define a callback to handle segments
def on_segment(segment: STTSegment):
    print(f"Segment: '{segment.text}'")

# 2. Transcribe with streaming enabled
print("Transcribing with real-time streaming...")

result = client.stt.transcribe(
    file="audio.wav",
    model="indus-stt-hi-en",        # Required for streaming
    streaming=True,       # Enable streaming
    language="hindi",
    on_segment=on_segment # Callback for real-time results
)

# 3. Access final results
print(f"
Complete transcription: {result.text}")

Important Notes:

The indus-stt-v1 model does not support streaming. Attempting to use streaming=True with model="indus-stt-v1" will raise a ValueError.
Noise cancellation is currently only supported in non-streaming mode. Using noise_cancellation=True with streaming=True will issue a warning and noise cancellation will not be applied.

STT with File Objects

Transcribe audio directly from open file handles or in-memory buffers.

from io import BytesIO
from induslabs import Client

client = Client(api_key="your_api_key")

# Example 1: Using an open file handle
with open("audio.wav", "rb") as f:
    result = client.stt.transcribe(
        file=f,
        model="indus-stt-v1",
        noise_cancellation=True
    )
    print(f"File handle transcription: {result.text}")

# Example 2: Using BytesIO (in-memory)
# Simulating loading bytes from somewhere (e.g. database or network)
with open("audio.wav", "rb") as f:
    audio_bytes = f.read()

audio_buffer = BytesIO(audio_bytes)
result = client.stt.transcribe(
    file=audio_buffer,
    model="indus-stt-v1",
    noise_cancellation=True
)
print(f"BytesIO transcription: {result.text}")

Noise Cancellation

Enable server-side noise suppression to improve transcription quality for audio with background noise. This feature is available for non-streaming transcriptions with both indus-stt-v1 and indus-stt-hi-en models.

from induslabs import Client

client = Client(api_key="your_api_key")

# Example 1: Noise cancellation with Indus-STT-V1
result = client.stt.transcribe(
    file="noisy_audio.wav",
    model="indus-stt-v1",
    streaming=False,
    noise_cancellation=True  # Enable noise suppression
)
print(f"Transcription: {result.text}")
if result.metrics:
    print(f"RTF: {result.metrics.rtf:.3f}")

# Example 2: Noise cancellation with Indus-STT-Hi-En
result = client.stt.transcribe(
    file="noisy_audio.wav",
    model="indus-stt-hi-en",
    streaming=False,
    language="hindi",
    noise_cancellation=True
)
print(f"Transcription: {result.text}")

Important Compatibility Notes:

Non-Streaming Only: Noise cancellation is currently only supported in non-streaming mode (streaming=False).
Warning Behavior: If you set both streaming=True and noise_cancellation=True, the SDK will issue a UserWarning and noise cancellation will not be applied.
Model Support: Works with both indus-stt-v1 and indus-stt-hi-en models.

Example with Warning

import warnings
from induslabs import Client

client = Client()

# This will trigger a warning
with warnings.catch_warnings(record=True) as caught:
    warnings.simplefilter("always")
    
    result = client.stt.transcribe(
        file="audio.wav",
        model="indus-stt-hi-en",
        streaming=True,              # Streaming enabled
        noise_cancellation=True,     # Noise cancellation requested
        language="hindi"
    )
    
    if caught:
        for warn in caught:
            print(f"Warning: {warn.message}")
            # Output: "Noise cancellation is only supported in non-streaming mode right now."

print(f"Result: {result.text}")

Async API

Use transcribe_async for non-blocking operations. You can run basic transcriptions or streaming sessions asynchronously.

import asyncio
from induslabs import Client, STTSegment

async def main():
    async with Client(api_key="your_api_key") as client:
        
        # --- Example 1: Basic Async Transcription ---
        result = await client.stt.transcribe_async(
            "audio.wav", 
            model="indus-stt-v1",
            streaming=False,
            noise_cancellation=True
        )
        print(f"Result (Indus-STT-V1): {result.text}")

        # --- Example 2: Async Streaming with Indus-STT-Hi-En Model ---
        segments = []
        
        def on_segment(segment: STTSegment):
            segments.append(segment)
            print(f"Streamed: {segment.text}")

        # Pass the callback and required streaming parameters
        result = await client.stt.transcribe_async(
            "audio.wav",
            model="indus-stt-hi-en",      # Required for streaming
            streaming=True,     # Enable streaming
            language="hindi",
            on_segment=on_segment
        )
        
        print(f"Final Text (Indus-STT-Hi-En): {result.text}")

asyncio.run(main())

Concurrent Requests

Process multiple requests in parallel for better throughput. You can mix different models and modes (streaming/non-streaming) in concurrent tasks.

import asyncio
from induslabs import Client

async def main():
    async with Client() as client:
        audio_file = "audio.wav"
        
        # Create multiple tasks with different configurations
        tasks = []
        
        # Task 1: Indus-STT-V1 model (non-streaming with noise cancellation)
        tasks.append(client.stt.transcribe_async(
            audio_file, model="indus-stt-v1", streaming=False, noise_cancellation=True
        ))
        
        # Task 2: Indus-STT-Hi-En model (non-streaming with noise cancellation)
        tasks.append(client.stt.transcribe_async(
            audio_file, model="indus-stt-hi-en", streaming=False, language="hindi", noise_cancellation=True
        ))
        
        # Task 3: Indus-STT-Hi-En model (streaming)
        tasks.append(client.stt.transcribe_async(
            audio_file, model="indus-stt-hi-en", streaming=True, language="hindi"
        ))
        
        # Run concurrently
        results = await asyncio.gather(*tasks)
        
        for i, result in enumerate(results, 1):
            print(f"Result {i}: {result.text[:30]}...")
            if result.metrics:
                print(f"  RTF: {result.metrics.rtf:.3f}")

asyncio.run(main())

Response Objects

Understanding the data structures returned by the SDK.

TTSResponse

from induslabs import Client
client = Client()

# Get a response object
response = client.tts.speak(text="Hello", voice="Indus-hi-maya")

# Access Properties
print(f"Data Size: {len(response.content)} bytes")  # bytes: Raw audio data
print(f"Request ID: {response.request_id}")         # str: Unique request identifier
print(f"Sample Rate: {response.sample_rate}")       # int: Audio sample rate
print(f"Format: {response.format}")                 # str: Audio format

# Methods
response.save("output.wav")          # Save to file
raw_data = response.get_audio_data() # Get raw bytes

STTResponse

from induslabs import Client
client = Client()

result = client.stt.transcribe("audio.wav")

# Properties
print(f"Text: {result.text}")                 # str: Final transcribed text
print(f"Segments: {len(result.segments)}")    # List[STTSegment]: All segments
print(f"Request ID: {result.request_id}")     # str: Unique request identifier

if result.has_error:
    print(f"Error: {result.error}")           # str: Error message if any

# Metrics (result.metrics)
if result.metrics:
    m = result.metrics
    print(f"Buffer Duration: {m.buffer_duration}s")
    print(f"Process Time: {m.transcription_time}s")
    print(f"Total Time: {m.total_time}s")
    print(f"RTF: {m.rtf}")                    # Real-time Factor

# Methods
data_dict = result.to_dict()                  # Get as dictionary

STTSegment

from induslabs import STTSegment

def on_segment(segment: STTSegment):
    # This object is passed to your callback during streaming
    print(f"Text: {segment.text}")    # str: Text content of segment
    print(f"Start: {segment.start}s") # float: Start time in seconds
    print(f"End: {segment.end}s")     # float: End time in seconds

Environment Variables

Configure the SDK using environment variables for better security.

# Set API key
export INDUSLABS_API_KEY="your_api_key_here"

# Now initialize without passing api_key
python -c "from induslabs import Client; client = Client()"

# Or in your .env file
INDUSLABS_API_KEY=your_api_key_here

import os
from induslabs import Client

# Load from environment
client = Client()

# Or load from .env file using python-dotenv
from dotenv import load_dotenv
load_dotenv()

client = Client()  # Automatically uses INDUSLABS_API_KEY

Best Practices

Security

Never hardcode API keys. Use environment variables or secure key management systems.

Performance

Use async API and concurrent requests for high-throughput applications.

Memory

Use streaming for large audio files to reduce memory consumption.

Retry Logic

Implement exponential backoff for transient network errors.

Monitoring

Log request_id for debugging and track credits_used for cost management.

Cleanup

Always close async sessions using context managers or explicit close() calls.

Troubleshooting

Common Issues

Authentication Errors

# Error: API key must be provided
# Solution: Set API key via parameter or environment variable
from induslabs import Client

# Method 1: Export ENV var
# export INDUSLABS_API_KEY="your_api_key"

# Method 2: Pass directly
client = Client(api_key="your_api_key")

Import Errors

# Error: No module named 'induslabs'
# Solution: Install the package
pip install induslabs

# Or upgrade to latest version
pip install --upgrade induslabs

Async Session Warnings

# Warning: Unclosed client session
from induslabs import Client

# Solution 1: Use context manager (Recommended)
async with Client(api_key="key") as client:
    # Your code here
    pass  # Auto cleanup

# Solution 2: Explicit close
client = Client(api_key="key")
try:
    # Your code here
    pass
finally:
    await client.close()

File Format Errors

# Error: ValueError: output_format must be 'wav', 'mp3', or 'pcm'
from induslabs import Client
client = Client()

# Solution: Use valid format
response = client.tts.speak(
    text="Test",
    voice="Indus-hi-maya",
    output_format="wav"  # Must be: wav, mp3, or pcm
)

Version History

v0.0.2 (Current)

Added comprehensive error handling
Improved async session management
Enhanced type hints and documentation
Added voice management capabilities

Support & Resources

API Reference

View detailed REST API documentation for TTS and STT endpoints

Community

Join our community for discussions, examples, and support

Issues

Report bugs or request features on our issue tracker

Contact

Reach out to our support team for enterprise inquiries