LiveKit Plugin Guide

This guide walks you through integrating the official LiveKit plugin for IndusLabs into your production voice agents. The plugin allows you to seamlessly connect our Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities directly into your LiveKit agent pipelines.

Installation

Install the plugin from PyPI using pip or uv:

pip install livekit-plugins-induslabs
# Or using uv:
uv add "livekit-agents[induslabs]"

Creating a LiveKit Voice Agent

Using the AgentSession paradigm, you can plug in IndusLabs STT and TTS directly alongside your chosen LLM and Voice Activity Detection (VAD). Follow these steps to build your custom agent.

Step 1: Setup Environment Variables

Before initializing the agent, supply your LiveKit server credentials and your IndusLabs API Key. This ensures secure authentication to our private model routers.

Need an API Key?

If you don't have an API key yet, you can create one here: https://playground.induslabs.io/register

Environment Initialization (Bash)

export INDUSLABS_API_KEY="<your-induslabs-api-key>"

export LIVEKIT_URL="<your-livekit-url>"
export LIVEKIT_API_KEY="<your-livekit-api-key>"
export LIVEKIT_API_SECRET="<your-livekit-secret>"

Recommendation

Never hardcode API keys directly in your repository. Always rely on a configured `.env` file or structured secret management injection for production scaling.

Step 2: Initialize the AgentSession

Create a Python script that boots your AgentSession securely using the livekit.plugins.induslabs instances inside the context layer.

Agent Implementation (Python)

import asyncio
import os
from dotenv import load_dotenv

from livekit import agents
from livekit.agents import AgentSession, Agent
from livekit.plugins import openai, silero
from livekit.plugins.induslabs import TTS, STT

load_dotenv()

class VoiceAssistant(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful voice assistant."
        )

async def entrypoint(ctx: agents.JobContext):
    # Initialize the AgentSession with IndusLabs TTS and STT
    session = AgentSession(
        stt=STT(),
        tts=TTS(voice="Indus-hi-Urvashi"),
        llm=openai.LLM(model="gpt-4o-mini"),
        vad=silero.VAD.load(),
    )
    
    await session.start(room=ctx.room, agent=VoiceAssistant())
    
    # Optionally trigger a proactive message from the agent
    await session.generate_reply(
        instructions="Greet the user warmly in Hindi."
    )

if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Step 3: Run the Agent

You can execute your LiveKit agent locally in development mode by appending the dev argument to your python command.

python agent.py dev

Recommendation

When deploying in production, ensure you host your LiveKit server and the Python application in the same geographic region (or as close as possible) to the api.induslabs.io endpoint. This minimizes audio routing latency and enables faster, more natural dialogue.

Text-to-Speech (TTS) Implementation

The TTS class provided by livekit.plugins.induslabs supports both full audio chunk synthesis and real-time streaming audio generation.

Step 1: Streaming Audio (Recommended)

For the lowest latency, use the stream() method to incrementally push text strings to the synthesizer and receive audio chunks instantaneously.

Streaming TTS Example (Python)

import asyncio
from livekit.plugins.induslabs import TTS

async def play_audio():
    tts = TTS(voice="Indus-hi-Urvashi")
    
    # Create an audio stream
    stream = tts.stream()
    
    # Push text sequentially
    stream.push("Namaste, ")
    stream.push("aap kaise hain?")
    stream.flush()

    # Consume the audio PCM frames
    async for event in stream:
        audio_frame = event.frame
        # Pipe this audio_frame into the LiveKit Room audio track
        print(f"Received audio frame: {len(audio_frame.data)} samples")

asyncio.run(play_audio())

Recommendation

For interactive conversational agents, always prefer the stream() method over synthesized chunks. Streaming reduces the time-to-first-byte (TTFB), which is critical for natural voice replies without awkward pauses.

Step 2: Standalone or Chunked Audio Synthesis

If generating longer audio segments offline, or if you want to use the TTS plugin completely standalone without a LiveKit AgentSession, you can synthesize the complete buffer at once. You can also generate audio via our REST API.

Python (Plugin)
cURL (REST API)

Chunked TTS Example (Python)

import asyncio
from livekit.plugins.induslabs import TTS

async def synthesize_text():
    # Uses default voice "Indus-hi-Urvashi"
    tts = TTS()
    
    # Generate audio for the entire sentence
    chunk_stream = tts.synthesize("Namaste, aap kaise hain?")
    
    async for audio_bytes in chunk_stream:
        # Expected to receive the entire audio buffer here
        print(f"Synthesized audio chunk of size {len(audio_bytes.data)} bytes")

asyncio.run(synthesize_text())

Standalone TTS Generation (cURL)

curl -X POST "https://voice.induslabs.io/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Namaste, IndusLabs plugin is ready.",
    "voice": "Indus-hi-Urvashi",
    "model": "indus-tts-v1",
    "api_key": "'$INDUSLABS_API_KEY'",
    "sample_rate": 24000
  }' \
  -o test_audio.mp3

Step 3: Configuring the Voice

Parameter	Type	Default	Description
`voice`	`str`	`"Indus-hi-Urvashi"`	(IndusLabs API) The identifier of the voice to synthesize with.
`sample_rate`	`int`	`24000`	(IndusLabs API) Sample rate of the output PCM audio in Hz.
`speed`	`float`	`1.0`	(IndusLabs API) Playback speed multiplier (e.g. 1.5 for faster).
`pitch_shift`	`float`	`0.0`	(IndusLabs API) Pitch shift in semitones.
`loudness_db`	`float`	`0.0`	(IndusLabs API) Gain adjustment in decibels.

Speech-to-Text (STT) Implementation

The STT class captures audio streams natively from the LiveKit room, chunks them using Voice Activity Detection (VAD), and sends them to the IndusLabs endpoints asynchronously.

Step 1: Reading Transcripts

As audio frames enter the pipeline from the AgentSession, they are grouped and processed. You can loop over the generated SpeechStream to evaluate interim or final transcript events. This can also be used in standalone applications fetching transcripts natively.

Python (Plugin)
cURL (REST API)

Streaming STT Example (Python)

import asyncio
from livekit.plugins.induslabs import STT
from livekit.agents.stt import SpeechEventType

async def transcribe_audio():
    # Initialize the STT model
    stt = STT(language="en")
    
    # Create an STT stream
    stream = stt.stream()

    # In a LiveKit agent, audio frames from the user are pushed into the stream automatically.
    # Below shows how the stream is consumed asynchronously:
    
    async for event in stream:
        if event.type == SpeechEventType.FINAL_TRANSCRIPT:
            transcript = event.alternatives[0].text
            print("Final Transcript:", transcript)
        elif event.type == SpeechEventType.INTERIM_TRANSCRIPT:
            transcript = event.alternatives[0].text
            print("Interim Transcript:", transcript)

asyncio.run(transcribe_audio())

Standalone STT Upload (cURL)

curl -N -X POST "https://voice.induslabs.io/v1/audio/transcribe" \
  -H "accept: text/event-stream" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.mp3" \
  -F "api_key=$INDUSLABS_API_KEY" \
  -F "language=en"

Step 2: Configuring Languages

You can optionally pass a language explicit parameter to bypass auto-detection and tighten model constraints.

STT Language Setting (Python)

from livekit.plugins.induslabs import STT

# Employs automatic language detection
stt = STT()

# Explicitly defining the language
stt = STT(language="hi")

Recommendation

Although STT supports auto-detection via `STT()`, explicitly passing a locale like language="hi" or language="en" yields faster detection times and consistently higher accuracy bounds for mono-lingual users.

Step 3: Configuring the STT

Parameter	Type	Default	Description
`sample_rate`	`int`	`16000`	(Plugin) Expected sample rate of inbound frames from LiveKit.
`language`	`str`	`None`	(IndusLabs API) Language code (e.g., `"hi"`). Auto-detects if omitted.

After finalizing the SDK integration, explore the SDK Docs for complex audio formats or review the native REST APIs via the TTS APIs and STT APIs.