LiveKit Agent Test Bench

Agent provider

Realtime provider

Realtime model

Voice

STT provider

TTS provider

LLM provider

ElevenLabs agent Select a realtime ElevenLabs agent or leave blank to use the built-in pipeline.

Display name

Agent name

Metadata payload You can provide raw text or JSON; it will be forwarded to the token service.

LiveKit server URL Leave blank to use the URL returned by the token service (when available).

Audio Monitor

Local microphone

Remote agent

Time (seconds) between the end of speech and when the final transcript text is available.

Time from the VAD-detected end of speech until the user turn is considered complete. This already includes any transcription_delay.

The amount of time (seconds) it took for the TTS model to synthesize the entire audio stream.

Time (seconds) for the TTS model to produce the first byte of audio.

Time (seconds) for the LLM to emit the first token of the completion.

The amount of time (seconds) it took for the LLM to stream the entire completion.

Total latency combines perception, reasoning, and speech: eou.end_of_utterance_delay + llm_metric.ttft + tts_metric.ttfb.