When to choose proxy over full mode
Proxy and full mode are different products, not different prices. Pick proxy when at least one of these applies:
- Compliance / data residency. Internal policy or regulation says user data must not flow through a third-party LLM proxy. With proxy mode, the LLM call happens entirely inside your perimeter — Vilow never sees your provider's response in transit.
- You've invested in a custom model. Fine-tuned GPT-4, an in-house Mistral, or a Llama-on-A100 cluster — proxy mode lets you keep using it while plugging in our personality, memory, and relationship engine.
- Provider-key isolation. Your security team doesn't want OpenAI / Anthropic credentials sitting in a third-party vendor's environment. With proxy you never share them with us.
- Your stack already does inference. Many teams have a centralised LLM gateway with budgeting, observability, and prompt-caching baked in. Proxy mode plugs the character intelligence layer into your gateway instead of duplicating those concerns on our side.
- Regional or sovereign clouds. Run on Azure EU, AWS GovCloud, or a local provider — Vilow's brain talks to your gateway over HTTPS, no matter where your inference runs.
If none of the above apply, full mode is simpler and faster to integrate.
| Full mode | Proxy mode | |
|---|---|---|
| Best for | Indie devs, startups, fastest path to working bot | Enterprise, compliance, custom or on-prem inference |
| Who runs the LLM? | Vilow (Grok) | You (any provider, any model) |
| Where does the LLM key live? | With us | Only with you |
| Inference observability / budgets | Surfaced via Vilow dashboard | Stays in your existing tooling |
| Latency | One round-trip via Vilow | Direct to your LLM |
| Streaming output | Built in (/send-stream) | Use your provider's streaming, then absorb |
| Adult / intimate features | Available with consent | Not available — third-party providers ban it |
How it works
POST /v1/proxy/chat/{user}/{character}/prepare — you send the user message; we return a session_id and a system_prompt tailored to the character's current mood, memory, and relationship.system_prompt + the user message. Your provider, your key, your costs.POST /v1/proxy/chat/{session_id}/absorb — you post the LLM's reply back. We update memory and emotional state, ready for the next turn.absorb, the character's memory and emotions don't advance — replies will start to feel detached after a few turns. The official SDKs handle absorb for you.
Quick start (Python)
pip install vilow-sdk openai
from vilow_sdk import VilowClient
from openai import OpenAI
vilow = VilowClient(api_key="vk_…")
oai = OpenAI()
def call_openai(system, user):
r = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": system},
{"role": "user", "content": user}],
)
return r.choices[0].message.content
reply = vilow.chat.send(
external_id="alice",
character_id=42,
user_message="как дела?",
llm=call_openai,
user_local_time="20:30",
)
print(reply)
Quick start (TypeScript)
npm i @vilow/sdk openai
import { VilowClient, type LLMCallable } from '@vilow/sdk';
import OpenAI from 'openai';
const vilow = new VilowClient({ apiKey: 'vk_…' });
const openai = new OpenAI();
const callOpenAI: LLMCallable = async (system, user) => {
const r = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: system },
{ role: 'user', content: user },
],
});
return r.choices[0].message.content ?? '';
};
const reply = await vilow.chat.send({
externalId: 'alice',
characterId: 42,
userMessage: 'how are you?',
llm: callOpenAI,
userLocalTime: '20:30',
});
Manual control
Need to inject your own logic between prepare and absorb (logging, streaming, retries)?
prep = vilow.chat.prepare(
external_id="alice", character_id=42,
user_message="how are you?",
)
# … your LLM call here, however you like
reply = call_my_llm(prep.system_prompt, prep.user_message)
vilow.chat.absorb(session_id=prep.session_id, llm_response=reply)
Using OpenAI tool calling (function calling)
If your assistant calls tools — flight search, weather lookup, RAG, your own DB — you run that loop yourself: your code, your OpenAI key, your tools. Vilow doesn't see or control mid-flight tool calls. We just need the final assistant text at the end so we can update memory and emotions.
By default prepare bakes a "respond with this JSON envelope" instruction into the styling prompt, which works great when the LLM only outputs prose — but it conflicts with OpenAI's tools mechanism (the model dumps tool arguments into content as JSON instead of using the tool_calls field, and tools never execute). The fix is one extra parameter on prepare:
POST /v1/proxy/chat/{external_id}/{character_id}/prepare
{
"user_message": "what's the weather in Barcelona tomorrow?",
"envelope": false // ← turn off the JSON-output instruction
}
With envelope: false the styling prompt no longer asks for JSON, so you can run a normal tools loop and post the final plain-prose reply to absorb. We then run an internal extraction pass on our side using the same persona context, and update memory + relationship + emotions automatically. The response includes "extraction": "server_extracted" so you can confirm it ran:
// 1. prepare with envelope=false
const prep = await fetch(`${VILOW}/v1/proxy/chat/${user}/${char}/prepare`, {
method: 'POST',
headers: { 'X-API-Key': VK, 'Content-Type': 'application/json' },
body: JSON.stringify({ user_message: msg, envelope: false }),
}).then(r => r.json());
// 2. your tools loop — your OpenAI key, your tools
const messages = [
{ role: 'system', content: prep.system_prompt },
{ role: 'user', content: prep.user_message },
];
let finalText = '';
for (let i = 0; i < 5; i++) {
const r = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
tools: MY_TOOLS,
tool_choice: 'auto',
});
const m = r.choices[0].message;
if (m.tool_calls?.length) {
messages.push(m);
for (const tc of m.tool_calls) {
const result = await runTool(tc.function.name, JSON.parse(tc.function.arguments));
messages.push({ role: 'tool', tool_call_id: tc.id, content: JSON.stringify(result) });
}
continue;
}
finalText = m.content || '';
break;
}
// 3. absorb the plain prose — Vilow extracts envelope server-side
const ab = await fetch(`${VILOW}/v1/proxy/chat/${prep.session_id}/absorb`, {
method: 'POST',
headers: { 'X-API-Key': VK, 'Content-Type': 'application/json' },
body: JSON.stringify({ llm_response: finalText }),
}).then(r => r.json());
// ab.extraction === "server_extracted"
// ab.facts_extracted, ab.relationship — populated as in envelope mode
envelope: true for a non-tools LLM, or use envelope: false + plain prose and let Vilow extract.
Cost note: server-side extraction runs one LLM call on Vilow's side per tool-using turn (a fast Grok call, not your provider). It's billed under your normal proxy-mode message quota — no surprise charges.
What we see — and what we don't
| Item | Vilow sees it? |
|---|---|
User message (you send it in prepare) | Yes |
LLM's reply (you send it in absorb) | Yes |
| Character / user IDs | Yes (they're ours) |
| Your OpenAI / Anthropic / Grok / local key | No — never |
| Which model / provider you used | No |
| Your LLM cost or token counts | No |
| Mid-flight tool calls / function calls in your stack | No |
vk_… token you pass to the SDK is for our API. Keep it on your backend — never embed it in browser JS or mobile apps. If you build a web client, route requests through your own server that holds the key.
What the styling prompt contains
Each prepare returns a system_prompt assembled from the character's stored state. The blocks are stable across calls so you can cache or post-process if needed:
- Character — name, gender (in natural prose), persona, backstory, custom traits.
- Personality — a 4–6 line description, written in tendency language ("tends to", "generally"). Generated once from the Big Five vector; not the raw scores.
- How personality interacts with current state — the precedence rule so your LLM knows that an optimist who is sad is sad.
- Current state — local time bucket, life event in progress, dominant emotions, dominant needs. All in prose, no numbers.
- What you know about this user — up to 3 cherry-picked facts from memory.
- Recent dialogue — last few turns of conversation.
- Relationship — duration of contact and warmth, in prose.
- Style for this reply — language, length, tone hints.
- Don'ts — boundaries (don't invent shared past, don't pile up questions, AI-disclosure rule).
- User: … — the user's actual message, ready to feed your LLM.
Endpoint reference
POST /v1/proxy/chat/{external_id}/{character_id}/prepare
{
"user_message": "как дела?",
"user_local_time": "20:30", // optional
"language": "ru", // optional override
"disclose_ai": true, // optional, default true
"envelope": true // optional, default true. set false
// when running OpenAI tool calling
// (see "Using OpenAI tool calling")
}
→ 200 OK
{
"session_id": "fd1a8c...",
"system_prompt": "# Character\nYou are Anna, a woman.\n…",
"user_message": "как дела?",
"expires_at": "2026-04-30T20:34:11Z",
"state_version": 1
}
POST /v1/proxy/chat/{session_id}/absorb
{
"llm_response": "Привет... день был тяжёлый. Сама как?",
"idempotency_key": "your-uuid" // optional, prevents double-absorb on retry
}
→ 200 OK
{
"session_id": "fd1a8c...",
"status": "absorbed",
"extraction": "envelope", // "envelope" | "server_extracted" | "plain"
"facts_extracted": 2,
"relationship": { "trust": 0.32, "friendship": 0.41, "stage": "warming" }
}
Extraction modes:
envelope— your LLM returned a JSON envelope (default flow withenvelope: trueon prepare).server_extracted— you usedenvelope: falseon prepare and posted plain prose; Vilow ran an internal extraction call to derive deltas + facts.plain— extraction couldn't be performed (envelope disabled and the server-side extraction call failed). The visible reply is still saved, but no memory/relationship updates happened for this turn.
Status codes
401— bad/missing API key.402— quota or balance limit (body.codesays which).404— character or session not found.409— session already absorbed (use a differentsession_idor send the originalidempotency_keyfor a safe retry).410— session expired (TTL is 24 h; just callprepareagain).
Intimate mode is NOT available in proxy
Proxy mode is for work-grade assistants and product chatbots. Intimate / 18+ features are intentionally not exposed here, for three reasons:
- Most public LLM providers (OpenAI, Anthropic) ban explicit content — using them via proxy with intimate flows would risk your account.
- Personal beats / NSFW are a different product surface (consent gating, age verification) and are guaranteed only inside full mode where Vilow runs the LLM.
- The intimate persona is generated by our shaping rules and stays on our side; it doesn't appear in a styling prompt regardless of consent state.
If a user explicitly asks for an intimate conversation, switch to full mode (POST /v1/chat/{user}/{character}/send) on our LLM. The same character keeps its memory, emotions, and relationship — see "Switching modes" below.
Switching modes — does the bot remember?
Yes. A character is a single row in our database — memory, emotions, trust/friendship, life events, promises, shared memories all live there. Both endpoints (full /v1/chat/…/send and proxy /v1/proxy/chat/…) write to the same row.
- Use proxy mode in the morning → bot stores a new fact, mood updates.
- Switch to full mode in the evening → bot recalls the morning fact, continues with the same mood.
- Switch back to proxy → memory stays in sync.
Best practices
- Use the SDK.
chat.send()handles prepare → llm → absorb in one call. Manualprepare+absorbis for special cases. - Always pass
user_local_time. Even roughHH:MMmatters — the bot's behaviour shifts by time of day. - Set
idempotency_keyon retries. Network blips happen; without it a retried absorb may collide with the original. - Monitor unanswered prepares. If > 50% of your sessions never get an absorb, your integration is leaking — check error handling in the LLM call path.