Streaming chat

When to use it

Pick streaming when you want the user to see the reply appear progressively rather than all at once. Identical underneath: same memory updates, same relationship deltas, same token cost.

Note on latency. This endpoint is UX-streaming, not first-token-latency optimisation. The server still waits for the full LLM response, then emits chunks. Time-to-first-byte ≈ full LLM call (~2-3 sec). Real first-token streaming is on the roadmap.

Endpoint

POST /v1/chat/{external_id}/{character_id}/send-stream
Content-Type: application/json
X-API-Key: ck_live_...
Accept: text/event-stream

{ "message": "Hi, how was your day?" }

Response: text/event-stream body composed of multiple events.

Event types

Event	data	When
`chunk`	`{"delta": "next word "}`	Per word/chunk of the reply, in order. Concatenate all `delta`s to reconstruct the full reply.
`done`	full ChatSendOut JSON	Once. Carries `reply`, `conversation_id`, `trust`, `friendship`, `relationship_events`, `usage`, etc. — same fields as `/send`.
`error`	`{"detail": "...", "code": int}`	If something went sideways mid-stream. Rare — pre-flight errors come back as HTTP 4xx/5xx before the stream opens.

Wire format (raw):

event: chunk
data: {"delta": "Hello, "}

event: chunk
data: {"delta": "friend! "}

event: chunk
data: {"delta": "How was your day?"}

event: done
data: {"reply": "Hello, friend! How was your day?", "conversation_id": 1, ...}

Browser example

Native EventSource doesn't support custom headers (no API key). Use fetch + a stream reader instead:

async function streamChat(externalId, characterId, message) {
  const r = await fetch(
    `https://api.vilow.dev/v1/chat/${externalId}/${characterId}/send-stream`,
    {
      method: "POST",
      headers: {
        "X-API-Key": "ck_live_...",
        "Content-Type": "application/json",
        "Accept": "text/event-stream",
      },
      body: JSON.stringify({message}),
    },
  );

  if (!r.ok) {
    throw new Error(`HTTP ${r.status}`);
  }

  const reader = r.body.pipeThrough(new TextDecoderStream()).getReader();
  let buffer = "";
  let finalPayload = null;

  while (true) {
    const {value, done} = await reader.read();
    if (done) break;
    buffer += value;
    const blocks = buffer.split("\n\n");
    buffer = blocks.pop();   // keep partial block
    for (const block of blocks) {
      const ev = parseSSE(block);
      if (!ev) continue;
      if (ev.event === "chunk") {
        appendToUI(ev.data.delta);            // your typewriter renderer
      } else if (ev.event === "done") {
        finalPayload = ev.data;               // trust, friendship, events, ...
      } else if (ev.event === "error") {
        throw new Error(ev.data.detail);
      }
    }
  }
  return finalPayload;
}

function parseSSE(block) {
  let event = null, data = null;
  for (const line of block.split("\n")) {
    if (line.startsWith("event: ")) event = line.slice(7).trim();
    else if (line.startsWith("data: ")) data = JSON.parse(line.slice(6));
  }
  return event ? {event, data} : null;
}

Node example

import { EventSource } from "undici";

// undici's EventSource accepts custom headers via fetch options
// — easier than the browser's native API.

const response = await fetch(
  "https://api.vilow.dev/v1/chat/alice-42/10/send-stream",
  {
    method: "POST",
    headers: {
      "X-API-Key": process.env.VILOW_API_KEY,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({message: "hi"}),
  },
);

for await (const chunk of response.body) {
  process.stdout.write(chunk);   // raw SSE — parse like in browser example
}

Quotas & errors

Quota gates apply before the stream opens. If you're out of quota or your balance is empty, you get a regular HTTP 402 with a JSON body — no event stream at all:

HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "detail": {
    "code": "monthly_quota_exhausted",
    "message": "You've used all 200 messages this period."
  }
}

Once the stream is open, errors mid-flight come back as event: error. Disconnecting the client mid-stream is fine — the chat already happened on the server side and is fully persisted. The next GET .../memory or GET .../relationship reflects the turn.

Streaming vs regular vs voice

Endpoint	UX	Time to first byte	Use it for
`POST /send`	full reply at once	~2-3 sec	backend → backend, simple integrations
`POST /send-stream`	typewriter	~2-3 sec (then ~40ms/word)	chat UI, when users wait actively
`POST /send-voice`	full reply + optional mp3	~3-5 sec (incl. TTS render)	voice apps; Pro plan only for audio. With `include_audio=true` the turn costs 5 message credits (text + voice synthesis bundled), otherwise 1 credit.