This guide walks you through integrating the official LiveKit plugin for IndusLabs into your production voice agents. The plugin allows you to seamlessly connect our Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities directly into your LiveKit agent pipelines.
Install the plugin from PyPI using pip or uv:
pip install livekit-plugins-induslabs
# Or using uv:
uv add "livekit-agents[induslabs]"
Using the AgentSession paradigm, you can plug in IndusLabs STT and TTS directly alongside your chosen LLM and Voice Activity Detection (VAD). Follow these steps to build your custom agent.
Before initializing the agent, supply your LiveKit server credentials and your IndusLabs API Key. This ensures secure authentication to our private model routers.
If you don't have an API key yet, you can create one here: https://playground.induslabs.io/register
export INDUSLABS_API_KEY="<your-induslabs-api-key>"
export LIVEKIT_URL="<your-livekit-url>"
export LIVEKIT_API_KEY="<your-livekit-api-key>"
export LIVEKIT_API_SECRET="<your-livekit-secret>"
Never hardcode API keys directly in your repository. Always rely on a configured `.env` file or structured secret management injection for production scaling.
Create a Python script that boots your AgentSession securely using the livekit.plugins.induslabs instances inside the context layer.
import asyncio
import os
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent
from livekit.plugins import openai, silero
from livekit.plugins.induslabs import TTS, STT
load_dotenv()
class VoiceAssistant(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful voice assistant."
)
async def entrypoint(ctx: agents.JobContext):
# Initialize the AgentSession with IndusLabs TTS and STT
session = AgentSession(
stt=STT(),
tts=TTS(voice="Indus-hi-Urvashi"),
llm=openai.LLM(model="gpt-4o-mini"),
vad=silero.VAD.load(),
)
await session.start(room=ctx.room, agent=VoiceAssistant())
# Optionally trigger a proactive message from the agent
await session.generate_reply(
instructions="Greet the user warmly in Hindi."
)
if __name__ == "__main__":
agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
You can execute your LiveKit agent locally in development mode by appending the dev argument to your python command.
python agent.py dev
When deploying in production, ensure you host your LiveKit server and the Python application in the same geographic region (or as close as possible) to the api.induslabs.io endpoint. This minimizes audio routing latency and enables faster, more natural dialogue.
The TTS class provided by livekit.plugins.induslabs supports both full audio chunk synthesis and real-time streaming audio generation.
For the lowest latency, use the stream() method to incrementally push text strings to the synthesizer and receive audio chunks instantaneously.
import asyncio
from livekit.plugins.induslabs import TTS
async def play_audio():
tts = TTS(voice="Indus-hi-Urvashi")
# Create an audio stream
stream = tts.stream()
# Push text sequentially
stream.push("Namaste, ")
stream.push("aap kaise hain?")
stream.flush()
# Consume the audio PCM frames
async for event in stream:
audio_frame = event.frame
# Pipe this audio_frame into the LiveKit Room audio track
print(f"Received audio frame: {len(audio_frame.data)} samples")
asyncio.run(play_audio())
For interactive conversational agents, always prefer the stream() method over synthesized chunks. Streaming reduces the time-to-first-byte (TTFB), which is critical for natural voice replies without awkward pauses.
If generating longer audio segments offline, or if you want to use the TTS plugin completely standalone without a LiveKit AgentSession, you can synthesize the complete buffer at once. You can also generate audio via our REST API.
import asyncio
from livekit.plugins.induslabs import TTS
async def synthesize_text():
# Uses default voice "Indus-hi-Urvashi"
tts = TTS()
# Generate audio for the entire sentence
chunk_stream = tts.synthesize("Namaste, aap kaise hain?")
async for audio_bytes in chunk_stream:
# Expected to receive the entire audio buffer here
print(f"Synthesized audio chunk of size {len(audio_bytes.data)} bytes")
asyncio.run(synthesize_text())
curl -X POST "https://voice.induslabs.io/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{
"text": "Namaste, IndusLabs plugin is ready.",
"voice": "Indus-hi-Urvashi",
"model": "indus-tts-v1",
"api_key": "'$INDUSLABS_API_KEY'",
"sample_rate": 24000
}' \
-o test_audio.mp3
| Parameter | Type | Default | Description |
|---|---|---|---|
voice | str | "Indus-hi-Urvashi" | (IndusLabs API) The identifier of the voice to synthesize with. |
sample_rate | int | 24000 | (IndusLabs API) Sample rate of the output PCM audio in Hz. |
speed | float | 1.0 | (IndusLabs API) Playback speed multiplier (e.g. 1.5 for faster). |
pitch_shift | float | 0.0 | (IndusLabs API) Pitch shift in semitones. |
loudness_db | float | 0.0 | (IndusLabs API) Gain adjustment in decibels. |
The STT class captures audio streams natively from the LiveKit room, chunks them using Voice Activity Detection (VAD), and sends them to the IndusLabs endpoints asynchronously.
As audio frames enter the pipeline from the AgentSession, they are grouped and processed. You can loop over the generated SpeechStream to evaluate interim or final transcript events. This can also be used in standalone applications fetching transcripts natively.
import asyncio
from livekit.plugins.induslabs import STT
from livekit.agents.stt import SpeechEventType
async def transcribe_audio():
# Initialize the STT model
stt = STT(language="en")
# Create an STT stream
stream = stt.stream()
# In a LiveKit agent, audio frames from the user are pushed into the stream automatically.
# Below shows how the stream is consumed asynchronously:
async for event in stream:
if event.type == SpeechEventType.FINAL_TRANSCRIPT:
transcript = event.alternatives[0].text
print("Final Transcript:", transcript)
elif event.type == SpeechEventType.INTERIM_TRANSCRIPT:
transcript = event.alternatives[0].text
print("Interim Transcript:", transcript)
asyncio.run(transcribe_audio())
curl -N -X POST "https://voice.induslabs.io/v1/audio/transcribe" \
-H "accept: text/event-stream" \
-H "Content-Type: multipart/form-data" \
-F "file=@audio.mp3" \
-F "api_key=$INDUSLABS_API_KEY" \
-F "language=en"
You can optionally pass a language explicit parameter to bypass auto-detection and tighten model constraints.
from livekit.plugins.induslabs import STT
# Employs automatic language detection
stt = STT()
# Explicitly defining the language
stt = STT(language="hi")
Although STT supports auto-detection via `STT()`, explicitly passing a locale like language="hi" or language="en" yields faster detection times and consistently higher accuracy bounds for mono-lingual users.
| Parameter | Type | Default | Description |
|---|---|---|---|
sample_rate | int | 16000 | (Plugin) Expected sample rate of inbound frames from LiveKit. |
language | str | None | (IndusLabs API) Language code (e.g., "hi"). Auto-detects if omitted. |
After finalizing the SDK integration, explore the SDK Docs for complex audio formats or review the native REST APIs via the TTS APIs and STT APIs.