Streaming chat

Same chat semantics as /send, but the reply arrives as a stream of word-level deltas via Server-Sent Events. Use it for typewriter UX.

When to use it

Pick streaming when you want the user to see the reply appear progressively rather than all at once. Identical underneath: same memory updates, same relationship deltas, same token cost.

Note on latency. This endpoint is UX-streaming, not first-token-latency optimisation. The server still waits for the full LLM response, then emits chunks. Time-to-first-byte ≈ full LLM call (~2-3 sec). Real first-token streaming is on the roadmap.

Endpoint

POST /v1/chat/{external_id}/{character_id}/send-stream
Content-Type: application/json
X-API-Key: ck_live_...
Accept: text/event-stream

{ "message": "Hi, how was your day?" }

Response: text/event-stream body composed of multiple events.

Event types

EventdataWhen
chunk {"delta": "next word "} Per word/chunk of the reply, in order. Concatenate all deltas to reconstruct the full reply.
done full ChatSendOut JSON Once. Carries reply, conversation_id, trust, friendship, relationship_events, usage, etc. — same fields as /send.
error {"detail": "...", "code": int} If something went sideways mid-stream. Rare — pre-flight errors come back as HTTP 4xx/5xx before the stream opens.

Wire format (raw):

event: chunk
data: {"delta": "Hello, "}

event: chunk
data: {"delta": "friend! "}

event: chunk
data: {"delta": "How was your day?"}

event: done
data: {"reply": "Hello, friend! How was your day?", "conversation_id": 1, ...}

Browser example

Native EventSource doesn't support custom headers (no API key). Use fetch + a stream reader instead:

async function streamChat(externalId, characterId, message) {
  const r = await fetch(
    `https://api.vilow.dev/v1/chat/${externalId}/${characterId}/send-stream`,
    {
      method: "POST",
      headers: {
        "X-API-Key": "ck_live_...",
        "Content-Type": "application/json",
        "Accept": "text/event-stream",
      },
      body: JSON.stringify({message}),
    },
  );

  if (!r.ok) {
    throw new Error(`HTTP ${r.status}`);
  }

  const reader = r.body.pipeThrough(new TextDecoderStream()).getReader();
  let buffer = "";
  let finalPayload = null;

  while (true) {
    const {value, done} = await reader.read();
    if (done) break;
    buffer += value;
    const blocks = buffer.split("\n\n");
    buffer = blocks.pop();   // keep partial block
    for (const block of blocks) {
      const ev = parseSSE(block);
      if (!ev) continue;
      if (ev.event === "chunk") {
        appendToUI(ev.data.delta);            // your typewriter renderer
      } else if (ev.event === "done") {
        finalPayload = ev.data;               // trust, friendship, events, ...
      } else if (ev.event === "error") {
        throw new Error(ev.data.detail);
      }
    }
  }
  return finalPayload;
}

function parseSSE(block) {
  let event = null, data = null;
  for (const line of block.split("\n")) {
    if (line.startsWith("event: ")) event = line.slice(7).trim();
    else if (line.startsWith("data: ")) data = JSON.parse(line.slice(6));
  }
  return event ? {event, data} : null;
}

Node example

import { EventSource } from "undici";

// undici's EventSource accepts custom headers via fetch options
// — easier than the browser's native API.

const response = await fetch(
  "https://api.vilow.dev/v1/chat/alice-42/10/send-stream",
  {
    method: "POST",
    headers: {
      "X-API-Key": process.env.VILOW_API_KEY,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({message: "hi"}),
  },
);

for await (const chunk of response.body) {
  process.stdout.write(chunk);   // raw SSE — parse like in browser example
}

Quotas & errors

Quota gates apply before the stream opens. If you're out of quota or your balance is empty, you get a regular HTTP 402 with a JSON body — no event stream at all:

HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "detail": {
    "code": "monthly_quota_exhausted",
    "message": "You've used all 200 messages this period."
  }
}

Once the stream is open, errors mid-flight come back as event: error. Disconnecting the client mid-stream is fine — the chat already happened on the server side and is fully persisted. The next GET .../memory or GET .../relationship reflects the turn.

Streaming vs regular vs voice

EndpointUXTime to first byteUse it for
POST /send full reply at once ~2-3 sec backend → backend, simple integrations
POST /send-stream typewriter ~2-3 sec (then ~40ms/word) chat UI, when users wait actively
POST /send-voice full reply + optional mp3 ~3-5 sec (incl. TTS render) voice apps; Pro plan only for audio. With include_audio=true the turn costs 5 message credits (text + voice synthesis bundled), otherwise 1 credit.