Proxy mode — keep control of your LLM stack

For teams that need to own their inference layer: regulated industries, fine-tuned models, on-prem deployments, regional data residency. Vilow returns the styling prompt and tracks character state — you run the LLM on your own infrastructure with your own key.

When to choose proxy over full mode

Proxy and full mode are different products, not different prices. Pick proxy when at least one of these applies:

If none of the above apply, full mode is simpler and faster to integrate.

 Full modeProxy mode
Best forIndie devs, startups, fastest path to working botEnterprise, compliance, custom or on-prem inference
Who runs the LLM?Vilow (Grok)You (any provider, any model)
Where does the LLM key live?With usOnly with you
Inference observability / budgetsSurfaced via Vilow dashboardStays in your existing tooling
LatencyOne round-trip via VilowDirect to your LLM
Streaming outputBuilt in (/send-stream)Use your provider's streaming, then absorb
Adult / intimate featuresAvailable with consentNot available — third-party providers ban it

How it works

1 POST /v1/proxy/chat/{user}/{character}/prepare — you send the user message; we return a session_id and a system_prompt tailored to the character's current mood, memory, and relationship.
2 Your code calls your LLM with system_prompt + the user message. Your provider, your key, your costs.
3 POST /v1/proxy/chat/{session_id}/absorb — you post the LLM's reply back. We update memory and emotional state, ready for the next turn.
Always close the loop. Without absorb, the character's memory and emotions don't advance — replies will start to feel detached after a few turns. The official SDKs handle absorb for you.

Quick start (Python)

pip install vilow-sdk openai

from vilow_sdk import VilowClient
from openai import OpenAI

vilow = VilowClient(api_key="vk_…")
oai = OpenAI()

def call_openai(system, user):
    r = oai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "system", "content": system},
                  {"role": "user",   "content": user}],
    )
    return r.choices[0].message.content

reply = vilow.chat.send(
    external_id="alice",
    character_id=42,
    user_message="как дела?",
    llm=call_openai,
    user_local_time="20:30",
)
print(reply)

Quick start (TypeScript)

npm i @vilow/sdk openai

import { VilowClient, type LLMCallable } from '@vilow/sdk';
import OpenAI from 'openai';

const vilow = new VilowClient({ apiKey: 'vk_…' });
const openai = new OpenAI();

const callOpenAI: LLMCallable = async (system, user) => {
  const r = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: system },
      { role: 'user',   content: user },
    ],
  });
  return r.choices[0].message.content ?? '';
};

const reply = await vilow.chat.send({
  externalId: 'alice',
  characterId: 42,
  userMessage: 'how are you?',
  llm: callOpenAI,
  userLocalTime: '20:30',
});

Manual control

Need to inject your own logic between prepare and absorb (logging, streaming, retries)?

prep = vilow.chat.prepare(
    external_id="alice", character_id=42,
    user_message="how are you?",
)
# … your LLM call here, however you like
reply = call_my_llm(prep.system_prompt, prep.user_message)

vilow.chat.absorb(session_id=prep.session_id, llm_response=reply)

Using OpenAI tool calling (function calling)

If your assistant calls tools — flight search, weather lookup, RAG, your own DB — you run that loop yourself: your code, your OpenAI key, your tools. Vilow doesn't see or control mid-flight tool calls. We just need the final assistant text at the end so we can update memory and emotions.

By default prepare bakes a "respond with this JSON envelope" instruction into the styling prompt, which works great when the LLM only outputs prose — but it conflicts with OpenAI's tools mechanism (the model dumps tool arguments into content as JSON instead of using the tool_calls field, and tools never execute). The fix is one extra parameter on prepare:

POST /v1/proxy/chat/{external_id}/{character_id}/prepare
{
  "user_message": "what's the weather in Barcelona tomorrow?",
  "envelope":     false       // ← turn off the JSON-output instruction
}

With envelope: false the styling prompt no longer asks for JSON, so you can run a normal tools loop and post the final plain-prose reply to absorb. We then run an internal extraction pass on our side using the same persona context, and update memory + relationship + emotions automatically. The response includes "extraction": "server_extracted" so you can confirm it ran:

// 1. prepare with envelope=false
const prep = await fetch(`${VILOW}/v1/proxy/chat/${user}/${char}/prepare`, {
  method: 'POST',
  headers: { 'X-API-Key': VK, 'Content-Type': 'application/json' },
  body: JSON.stringify({ user_message: msg, envelope: false }),
}).then(r => r.json());

// 2. your tools loop — your OpenAI key, your tools
const messages = [
  { role: 'system', content: prep.system_prompt },
  { role: 'user',   content: prep.user_message  },
];
let finalText = '';
for (let i = 0; i < 5; i++) {
  const r = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages,
    tools: MY_TOOLS,
    tool_choice: 'auto',
  });
  const m = r.choices[0].message;
  if (m.tool_calls?.length) {
    messages.push(m);
    for (const tc of m.tool_calls) {
      const result = await runTool(tc.function.name, JSON.parse(tc.function.arguments));
      messages.push({ role: 'tool', tool_call_id: tc.id, content: JSON.stringify(result) });
    }
    continue;
  }
  finalText = m.content || '';
  break;
}

// 3. absorb the plain prose — Vilow extracts envelope server-side
const ab = await fetch(`${VILOW}/v1/proxy/chat/${prep.session_id}/absorb`, {
  method: 'POST',
  headers: { 'X-API-Key': VK, 'Content-Type': 'application/json' },
  body: JSON.stringify({ llm_response: finalText }),
}).then(r => r.json());
// ab.extraction === "server_extracted"
// ab.facts_extracted, ab.relationship — populated as in envelope mode
Don't roll your own extraction layer. Wrapping the reply in a hand-built JSON shell with zero deltas (just to make absorb "happy") silently disables the personality engine — trust, friendship, and emotions stop changing, no facts get extracted, the character freezes. Either stick with default envelope: true for a non-tools LLM, or use envelope: false + plain prose and let Vilow extract.

Cost note: server-side extraction runs one LLM call on Vilow's side per tool-using turn (a fast Grok call, not your provider). It's billed under your normal proxy-mode message quota — no surprise charges.

What we see — and what we don't

ItemVilow sees it?
User message (you send it in prepare)Yes
LLM's reply (you send it in absorb)Yes
Character / user IDsYes (they're ours)
Your OpenAI / Anthropic / Grok / local keyNo — never
Which model / provider you usedNo
Your LLM cost or token countsNo
Mid-flight tool calls / function calls in your stackNo
Vilow API key location. The vk_… token you pass to the SDK is for our API. Keep it on your backend — never embed it in browser JS or mobile apps. If you build a web client, route requests through your own server that holds the key.

What the styling prompt contains

Each prepare returns a system_prompt assembled from the character's stored state. The blocks are stable across calls so you can cache or post-process if needed:

Endpoint reference

POST /v1/proxy/chat/{external_id}/{character_id}/prepare

{
  "user_message":     "как дела?",
  "user_local_time":  "20:30",         // optional
  "language":         "ru",             // optional override
  "disclose_ai":      true,             // optional, default true
  "envelope":         true              // optional, default true. set false
                                        // when running OpenAI tool calling
                                        // (see "Using OpenAI tool calling")
}

→ 200 OK
{
  "session_id":     "fd1a8c...",
  "system_prompt":  "# Character\nYou are Anna, a woman.\n…",
  "user_message":   "как дела?",
  "expires_at":     "2026-04-30T20:34:11Z",
  "state_version":  1
}

POST /v1/proxy/chat/{session_id}/absorb

{
  "llm_response":     "Привет... день был тяжёлый. Сама как?",
  "idempotency_key":  "your-uuid"      // optional, prevents double-absorb on retry
}

→ 200 OK
{
  "session_id":      "fd1a8c...",
  "status":          "absorbed",
  "extraction":      "envelope",       // "envelope" | "server_extracted" | "plain"
  "facts_extracted": 2,
  "relationship":    { "trust": 0.32, "friendship": 0.41, "stage": "warming" }
}

Extraction modes:

Status codes

Intimate mode is NOT available in proxy

Proxy mode is for work-grade assistants and product chatbots. Intimate / 18+ features are intentionally not exposed here, for three reasons:

If a user explicitly asks for an intimate conversation, switch to full mode (POST /v1/chat/{user}/{character}/send) on our LLM. The same character keeps its memory, emotions, and relationship — see "Switching modes" below.

Switching modes — does the bot remember?

Yes. A character is a single row in our database — memory, emotions, trust/friendship, life events, promises, shared memories all live there. Both endpoints (full /v1/chat/…/send and proxy /v1/proxy/chat/…) write to the same row.

Best practices