When to use it
Pick streaming when you want the user to see the reply appear progressively rather than all at once. Identical underneath: same memory updates, same relationship deltas, same token cost.
Endpoint
POST /v1/chat/{external_id}/{character_id}/send-stream
Content-Type: application/json
X-API-Key: ck_live_...
Accept: text/event-stream
{ "message": "Hi, how was your day?" }
Response: text/event-stream body composed of multiple events.
Event types
| Event | data | When |
|---|---|---|
chunk |
{"delta": "next word "} |
Per word/chunk of the reply, in order. Concatenate all deltas to reconstruct the full reply. |
done |
full ChatSendOut JSON | Once. Carries reply, conversation_id, trust, friendship, relationship_events, usage, etc. — same fields as /send. |
error |
{"detail": "...", "code": int} |
If something went sideways mid-stream. Rare — pre-flight errors come back as HTTP 4xx/5xx before the stream opens. |
Wire format (raw):
event: chunk
data: {"delta": "Hello, "}
event: chunk
data: {"delta": "friend! "}
event: chunk
data: {"delta": "How was your day?"}
event: done
data: {"reply": "Hello, friend! How was your day?", "conversation_id": 1, ...}
Browser example
Native EventSource doesn't support custom headers (no API key). Use fetch + a stream reader instead:
async function streamChat(externalId, characterId, message) { const r = await fetch( `https://api.vilow.dev/v1/chat/${externalId}/${characterId}/send-stream`, { method: "POST", headers: { "X-API-Key": "ck_live_...", "Content-Type": "application/json", "Accept": "text/event-stream", }, body: JSON.stringify({message}), }, ); if (!r.ok) { throw new Error(`HTTP ${r.status}`); } const reader = r.body.pipeThrough(new TextDecoderStream()).getReader(); let buffer = ""; let finalPayload = null; while (true) { const {value, done} = await reader.read(); if (done) break; buffer += value; const blocks = buffer.split("\n\n"); buffer = blocks.pop(); // keep partial block for (const block of blocks) { const ev = parseSSE(block); if (!ev) continue; if (ev.event === "chunk") { appendToUI(ev.data.delta); // your typewriter renderer } else if (ev.event === "done") { finalPayload = ev.data; // trust, friendship, events, ... } else if (ev.event === "error") { throw new Error(ev.data.detail); } } } return finalPayload; } function parseSSE(block) { let event = null, data = null; for (const line of block.split("\n")) { if (line.startsWith("event: ")) event = line.slice(7).trim(); else if (line.startsWith("data: ")) data = JSON.parse(line.slice(6)); } return event ? {event, data} : null; }
Node example
import { EventSource } from "undici"; // undici's EventSource accepts custom headers via fetch options // — easier than the browser's native API. const response = await fetch( "https://api.vilow.dev/v1/chat/alice-42/10/send-stream", { method: "POST", headers: { "X-API-Key": process.env.VILOW_API_KEY, "Content-Type": "application/json", }, body: JSON.stringify({message: "hi"}), }, ); for await (const chunk of response.body) { process.stdout.write(chunk); // raw SSE — parse like in browser example }
Quotas & errors
Quota gates apply before the stream opens. If you're out of quota or your balance is empty, you get a regular HTTP 402 with a JSON body — no event stream at all:
HTTP/1.1 402 Payment Required
Content-Type: application/json
{
"detail": {
"code": "monthly_quota_exhausted",
"message": "You've used all 200 messages this period."
}
}
Once the stream is open, errors mid-flight come back as event: error. Disconnecting the client mid-stream is fine — the chat already happened on the server side and is fully persisted. The next GET .../memory or GET .../relationship reflects the turn.
Streaming vs regular vs voice
| Endpoint | UX | Time to first byte | Use it for |
|---|---|---|---|
POST /send |
full reply at once | ~2-3 sec | backend → backend, simple integrations |
POST /send-stream |
typewriter | ~2-3 sec (then ~40ms/word) | chat UI, when users wait actively |
POST /send-voice |
full reply + optional mp3 | ~3-5 sec (incl. TTS render) | voice apps; Pro plan only for audio. With include_audio=true the turn costs 5 message credits (text + voice synthesis bundled), otherwise 1 credit. |