Snapshot import

Why this exists

Manual or auto character creation gives you a blank-slate persona. Snapshot import gives you a persona grounded in real material — chat history, written prose, mailbox archives. Three things drop in pre-filled:

Big Five — calibrated from communication patterns, not a default 0.5.
Seed memories — concrete facts (occupation, hobbies, family, current concerns) ready in the bot's memory store.
Relationship state (optional) — the bot already feels like a friend / family member / partner instead of a stranger.

When to use it

Memorial / digital twin — recreate someone close from preserved messages and writing.
Companion bootstrap — give a character a deeper baseline than a one-line hint.
Onboarding from existing data — clients with chat archives, support transcripts, journaling.

If you just need a fresh fictional character, use manual or auto creation. Snapshot import shines when there's real source material the bot should already know.

How it works

1 POST your data to /v1/users/{external_id}/characters/from_snapshot. Returns 202 with a job id immediately.

2 Vilow runs the pipeline asynchronously: extract → analyze (chat-export sources) → map → apply.

3 Poll /v1/snapshot_jobs/{job_id} every few seconds. Typical completion: 15–60 s.

4 When status is completed, the response carries character_id. Chat with it through the standard /v1/chat/{external_id}/{character_id}/send endpoint — the character is fully active.

Three data sources

Source	Best for	Input	Notes
`plain_text`	Free-form description, journaling, written portrait	Up to 500 KB of text + an optional 8000-char description	No analyzer stage — the prose is fed directly to the mapper. Fastest path.
`slack_export`	Bringing in someone's voice from chat history	A standard Slack workspace export `.zip` + their Slack user id (e.g. `U02ABCD`)	Filters to messages the subject sent. Bot messages and join/leave noise are skipped. Slack-specific markup (`<@U…>`, link tags) is normalised.
`gmail_mbox`	Email archives — long-form prose voice	`.mbox` from Google Takeout (or a single `.eml`) + sender email	Filters to outbound messages from the target address. Quoted replies, signatures, HTML and auto-replies are stripped.

Quick start (curl, plain text)

curl -X POST https://api.vilow.dev/v1/users/u_alex/characters/from_snapshot \
  -H "X-API-Key: $VILOW_API_KEY" \
  -F "subject_name=Marek Sokol" \
  -F "data_source=plain_text" \
  -F "subject_description=Marek is a 41-year-old Czech architect..."

Response (202 Accepted):

{
  "snapshot_job_id": "snap_9ad6eb14a6f04a9c...",
  "status": "pending",
  "estimated_seconds": 60
}

Poll until completion:

curl https://api.vilow.dev/v1/snapshot_jobs/snap_9ad6eb14a6f04a9c... \
  -H "X-API-Key: $VILOW_API_KEY"

# {"snapshot_job_id":"snap_…","status":"completed",
#  "character_id": 42,"extraction_confidence": 0.9,"warnings": [],
#  "error": null,"created_at":"…","completed_at":"…"}

Then chat as usual:

curl -X POST https://api.vilow.dev/v1/chat/u_alex/42/send \
  -H "X-API-Key: $VILOW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"message":"Hey, how is the Cascais project going?"}'

Quick start (Slack export)

curl -X POST https://api.vilow.dev/v1/users/u_alex/characters/from_snapshot \
  -H "X-API-Key: $VILOW_API_KEY" \
  -F "subject_name=Lena" \
  -F "data_source=slack_export" \
  -F "target_identifier=U02ABCD" \
  -F "file=@workspace_export.zip"

Larger uploads take longer to extract, but the API call still returns 202 immediately and you poll the job for completion.

Quick start (Gmail mbox)

curl -X POST https://api.vilow.dev/v1/users/u_alex/characters/from_snapshot \
  -H "X-API-Key: $VILOW_API_KEY" \
  -F "subject_name=Sofia" \
  -F "data_source=gmail_mbox" \
  -F "target_identifier=sofia@example.com" \
  -F "file=@Mail.mbox"

Mbox files from Google Takeout often run several gigabytes. We process them in-place — no need to pre-filter on your side. Only messages with From: sofia@example.com are kept and analysed.

What lands on the character

After completion, the character row has these fields populated from the mapping:

Field	Source
`name`, `gender`, `persona`, `backstory`	Mapper output
`big_five` (0..1 × 5)	Calibrated from communication patterns / description
`custom_traits`	Communication style, quirks, decision tendencies in prose
`default_language`	Detected from source material
`trust`, `friendship`, `relationship_stage`	Set authoritatively from `relationship_to_subject` when provided. Otherwise inferred from cues in the source — defaults to `strangers` when no signal of prior history.
`user_relationship_label`	Either the value of `relationship_to_subject` from the form, or a phrase the mapper inferred from the source. Shown to the bot at chat time as `"the person you're talking to is your X"` so the social bond feels real on turn one.
`signature_phrases` (0–14 strings)	Verbatim short phrases the source shows the subject actually uses (`"Right."`, `"Mm."`, `"слушай"`, `"короче лан"`). The chat system prompt injects them as concrete style anchors — the bot's voice becomes recognisable without us writing custom rules. Mapper extracts; chat-export sources usually yield 8–12, plain-text imports yield fewer unless the description quotes the subject directly.
`emotions` (6-dim wheel, 0..10 each)	Seeded if the source describes a current emotional state ("currently tired", "thrilled about X")
Seed `Memory` rows (8–14)	Concrete specifics: tools, places, hobbies-with-objects, ongoing concerns, family details

The job result also returns:

extraction_confidence — 0..1, how well the source grounded the output. ≥ 0.7 on rich material.
warnings — list of specific gaps in the source (e.g. "no info on family situation"). Empty when the source covered everything.

Signal floor

The job rejects with 422 if the input is too thin to ground a character: fewer than 30 messages and no documents and a description shorter than 200 characters. Drop more signal in or use a richer source.

Endpoint reference

POST `/v1/users/{external_id}/characters/from_snapshot`

Multipart/form-data. Accepts an upload up to 200 MB, text up to 500 KB.

Field	Required	Type	Notes
`subject_name`	yes	string ≤ 120	Display name; stored on the character.
`data_source`	yes	`plain_text` · `slack_export` · `gmail_mbox`
`subject_description`	plain_text only (recommended)	string ≤ 8000	Importer's prose about the subject. For chat-export sources, optional context.
`text_content`	plain_text only	string ≤ 500 KB	Long-form text — pasted journal, transcript, etc. Combined with `subject_description`.
`target_identifier`	slack_export, gmail_mbox	string ≤ 200	Slack user id or sender email. Filter cue.
`file`	slack_export, gmail_mbox	binary ≤ 200 MB	`.zip` for Slack, `.mbox`/`.eml` for Gmail.
`relationship_to_subject`	no	string ≤ 120	Free-form social role of the user toward the subject — `"my husband"`, `"my late grandmother"`, `"old friend Sofia"`, `"hairdresser"`. When provided, seeds `trust`, `friendship` and `relationship_stage` authoritatively, and the bot is told to address the user in that role from turn one. When omitted, the mapper looks for cues in the description / messages — falls back to `strangers` when nothing speaks for a prior bond.

Returns 202 Accepted:

{ "snapshot_job_id": "snap_…", "status": "pending", "estimated_seconds": 60 }

GET `/v1/snapshot_jobs/{snapshot_job_id}`

Poll for status. Returns:

{
  "snapshot_job_id":"snap_…",
  "status":"pending|running|completed|failed",
  "character_id": 42,
  "extraction_confidence": 0.9,
  "warnings": [],
  "error": null,
  "created_at": "...",
  "completed_at": "..."
}

Status codes

Code	Meaning
`202`	Job accepted, processing.
`400`	Missing required field or data_source/file mismatch.
`402`	Snapshot quota exhausted, card declined / authentication required for overage charge, or no payment method on file. Detail body has a specific `code`: `snapshot_lifetime_cap_reached`, `snapshot_quota_exhausted`, `card_declined`, `authentication_required`, `no_payment_method`, `subscription_inactive`.
`404`	External user not found.
`413`	Upload exceeds 200 MB or text exceeds 500 KB.
`422`	Input doesn't meet the signal floor — see signal floor.

Quotas and pricing

Snapshot imports are counted independently from chat messages — they consume a separate counter on the tenant.

Plan	Included	Overage
Free	1 lifetime	Hard cap — upgrade to import more.
Hobby	3 / period	€7 per import, charged immediately to your saved card.
Pro	20 / period	€7 per import, charged immediately to your saved card.
Enterprise	Unlimited	—

Overage charges go through Stripe directly — no top-up required. We charge the same payment method that pays your subscription, off-session, before the pipeline runs. If the card is declined the API returns 402 and no character is created.

A failed job is not counted against quota — the counter is refunded automatically. If the failed job was an overage charge, the €7 is refunded to the same card via stripe.Refund.create immediately. The SnapshotJob row keeps the stripe_payment_intent_id and refunded_at for traceability.

Privacy and retention

Uploaded files are read into memory, processed, and discarded. Only the SHA-256 hash is recorded for traceability.
Pipeline intermediate dumps (extracted bundle, analysis markdown, mapper output) are kept for 90 days in an admin-only off-web directory, then auto-deleted.
DELETE /v1/users/{external_id}/characters/{character_id} cascades to all Memory rows and conversations. The on-disk pipeline dump is also removed.
Snapshot imports that produce a character generate seed memories with source = "snapshot_import" — distinguishable from chat-extracted memories if you ever need to inspect or filter.

Best practices

Plain text imports: use text_content for anything over a paragraph — the 8000-character cap on subject_description is for short summaries. text_content takes 500 KB.
Curl gotcha: when piping multi-line text, use -F "text_content=<path/to/file.txt" to read from disk. Shell variable substitution can truncate at the first paragraph break.
Chat-export sources: add a subject_description alongside the file. The importer's prose tells the analyzer how the subject fits into your life — useful when the chat alone is professional and reveals little personality.
Confidence below 0.5: the job still completes — but expect the persona to feel generic. Either feed richer source material or accept it as a starting point and edit through the dashboard.
Don't import the same person twice expecting different output — the mapper is mostly deterministic at low temperature. Fix the source instead.

What's not in scope (yet)

WhatsApp / Telegram / Discord / Signal exports.
Direct OAuth into Slack or Gmail. We accept exports/archives only.
Re-running a snapshot to update a character — re-import creates a new character. Updating an existing one through the standard PATCH endpoint is the supported path.
Conversational refinement of an imported persona ("he wouldn't say that"). Vilow's runtime memory and emotion system already absorbs that signal through normal chat — no special UI needed.