Transcription Delay
Time (seconds) between the end of speech and when the final transcript text is available.
Select a session preset, request a token, and connect to the agent room.
Time (seconds) between the end of speech and when the final transcript text is available.
Time from the VAD-detected end of speech until the user turn is considered complete. This already includes any transcription_delay.
The amount of time (seconds) it took for the TTS model to synthesize the entire audio stream.
Time (seconds) for the TTS model to produce the first byte of audio.
Time (seconds) for the LLM to emit the first token of the completion.
The amount of time (seconds) it took for the LLM to stream the entire completion.
Total latency combines perception, reasoning, and speech: eou.end_of_utterance_delay + llm_metric.ttft + tts_metric.ttfb.