Agent — AI museum guide¶
dataland-agent is the visitor-facing chat backend. It streams Gemini responses over Server-Sent Events, calls into RAG for grounded answers, and pulls live biometrics from museum-api so the guide can react to what the visitor is actually doing on the floor. The same FastAPI app also hosts a static test chat client and an admin dashboard, and its sibling process auth_server.py is the dataland-auth service.
| Container | dataland-agent |
| Image | dataland/agent:${IMAGE_TAG} (built from dataland-agent/Dockerfile) |
| Public port | 4141 (via Cloudflare → dataland.chat) |
| Internal URL | http://dataland-agent:4141 |
| Memory / CPU | mem_limit: 1g, mem_reservation: 256m, cpus: 1.0 |
| Healthcheck | GET /health (interval 30s, timeout 10s, 3 retries, 10s start period) |
| Framework | FastAPI + pydantic-ai over uvicorn (--workers ${UVICORN_WORKERS:-2}) |
| Model | google-gla:gemini-3.5-flash |
Recent changes
This page reflects the 2026-06-03 → 2026-06-04 change-set: DAT-269 (model standardized on gemini-3.5-flash), DAT-296 (empty-/museum instant welcome + welcome push, /register / /current removed), DAT-213 (silent complaint detection), DAT-284 (suggestions restored on reload), DAT-281 (real room names), DAT-279/280/285/261 (tool hardening + get_scene_flow), DAT-286 (local JWKS mirror), and the RAG /search read-timeout raise to 25s.
What it does¶
- Streams chat responses to the mobile app via SSE for two surfaces: museum mode (in-museum, telemetry-aware) and general mode (outside the museum, no live vitals), each with a multimodal (image-upload) variant.
- Searches artworks and museum knowledge through RAG (hybrid dense + BM25 + rerank for text; Qdrant
imagescollection for visual search). - Fetches the visitor's current vitals (
GET /api/tickets/{id}/vitalson museum-api) to ground answers in real-time location, chapter, and biometric context. - Initializes a visit instantly on an empty first
/museummessage with a static personalized welcome and fires the welcome push off-path (DAT-296). - Runs a silent server-side complaint judge on every visitor turn, off the response path, and files an ops ticket when it fires (DAT-213).
- Receives notification-triggered chats from the worker via the internal
/v1/service/chat/museumendpoint, gated byAGENT_SERVICE_TOKEN. - Persists conversation history (provider-native message blobs) and serves a mobile-friendly timeline, plus message-level like/dislike feedback.
- Exposes an admin dashboard at
/adminfor service health, DB stats, and config inspection.
Architecture & dependencies¶
graph LR
APP[Mobile app] -- "RS256 JWT" --> A[agent :4141]
WK[notification-worker] -- "AGENT_SERVICE_TOKEN" --> A
A --> R[rag :4143]
A --> M[museum-api :5001]
A -- "complaint / welcome push" --> N[notification-api :8080]
A --> P[(postgres :5432)]
A --> RD[(redis :6379)]
A -.JWKS.-> AU[dataland-auth :9000]
A -.JWKS.-> CMS["CMS / bilet.io JWKS"]
| Upstream | Why |
|---|---|
dataland-rag:4143 |
Hybrid text retrieval (/search, collection knowledge) + image search (vision service) |
dataland-museum:5001 |
Live visitor vitals + room/chapter catalog |
dataland-notification-api:8080 |
Off-path complaint tickets + welcome push (/v1/ops/*) |
dataland-postgres:5432 |
Conversation history, users, ticket↔user mapping |
dataland-redis:6379 |
Ephemeral state, active-ticket mirror reader |
dataland-auth:9000 + CMS JWKS |
RS256 verification of mobile JWTs |
The agent has hard depends_on with condition: service_healthy on postgres, redis, rag, and museum-api — it will not start until all four pass their healthchecks. The downstream notification calls are best-effort and never block a chat turn.
Chat endpoints¶
All chat lives under /v1 and requires a mobile RS256 JWT in Authorization: Bearer <token>. Every response is text/event-stream with Cache-Control: no-cache, Connection: keep-alive, X-Accel-Buffering: no (the last disables nginx/Cloudflare buffering so deltas flush immediately).
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/chat/museum |
In-museum SSE chat bound to a ticket_id |
POST |
/v1/chat/museum/multimodal |
Same, multipart with an optional image |
POST |
/v1/chat/general |
Outside-museum SSE chat (no live vitals) |
POST |
/v1/chat/general/multimodal |
Same, multipart with an optional image |
POST |
/v1/service/chat/museum |
Internal, notification-triggered museum chat (service token) |
Museum mode¶
POST /v1/chat/museum takes a JSON body of { "message": "...", "ticket_id": "..." }.
The ticket_id is the RDC museum ticket id for the visit and it is load-bearing: the museum conversation is permanently bound to it. register_ticket() upserts the ticket↔user mapping and guarantees a conversation row whose id == ticket_id, so the mobile client can use the ticket id directly as the conversation_id with no extra lookup. Registration is implicit and idempotent — the first message with a ticket registers it; subsequent messages resume the same chat.
/register and /current are gone (DAT-296)
The explicit POST /v1/tickets/register and /v1/tickets/current endpoints were removed. The /v1/tickets router is now empty. Registration happens implicitly on the first POST /v1/chat/museum, and conversation_id == ticket_id is the contract. Do not call a register endpoint — there isn't one.
General mode¶
POST /v1/chat/general takes { "message": "...", "conversation_id": null }. Omit conversation_id (or send null / "None" / "null" / "undefined", which the schema coerces to None) to start a new session; pass an existing id to continue. A conversation_id that doesn't belong to the caller returns 404. General mode has no ticket_id and therefore no vitals / room tools.
Multimodal¶
The multipart variants accept message (form field, may be empty), ticket_id (museum) or conversation_id (general), and an optional image file. At least one of message / image must be present (else 400). When an image is attached, process_multimodal_image() runs before the agent:
- Validates the upload against
CHAT_IMAGE_MAX_BYTES(10 MB default) andCHAT_IMAGE_ALLOWED_MIMES(image/jpeg,png,webp,gif,heic,heif). - Runs visual search against RAG's
imagescollection (top_k=5). - Injects an
[IMAGE CONTEXT]block of the top candidate artworks into the user message so the agent answers "what is this piece?" without re-asking. - Resolves a public URL for the upload (RAG query URL → RAG upload → public GCS fallback when
GCS_USER_UPLOAD_ENABLED) and appends[USER_IMAGE]https://...for UI persistence only.
The system prompt tells the model to trust [IMAGE CONTEXT] and never read the [USER_IMAGE] line aloud.
SSE event frames¶
Every frame is a single data: {json}\n\n line. The pipeline (app/agent/streaming.py) uses pydantic-ai's agent.iter() to walk the execution graph, streaming text deltas from ModelRequestNode and tool events from CallToolsNode before each tool runs, so the client can show live status.
sequenceDiagram
participant C as Client
participant A as agent stream_response
participant G as Gemini (gemini-3.5-flash)
participant T as Tools (vitals/room/knowledge/...)
C->>A: POST /v1/chat/museum
A-->>C: {conversation_id, mode}
Note over A: check_input regex → check_input_model_armor
A->>G: agent.iter(message + inline date, history)
G-->>A: tool call
A-->>C: {tool, query}
A->>T: execute tool
T-->>A: result (+ sources/images on ctx)
G-->>A: text deltas
A-->>C: {content: "..."} (many)
A-->>C: {generating_suggestions: true}
A->>G: generate_suggestions (2nd LLM call)
A-->>C: {sources, images}
A-->>C: {suggestions: [...]}
A-->>C: {done: true, message_id}
Frame shapes (see app/schemas/sse.py):
| Frame | When | Example payload |
|---|---|---|
| meta | first, always | {"conversation_id": "...", "mode": "museum"} |
| content | per text delta | {"content": "Welcome to the "} |
| tool | before each tool runs | {"tool": "get_visitor_vitals", "query": ""} |
| generating_suggestions | after text, before suggestion call | {"generating_suggestions": true} |
| sources / images | if RAG sources or images were collected | {"sources": [...], "images": [...]} |
| suggestions | if follow-ups were generated | {"suggestions": ["...", "...", "..."]} |
| done | final | {"done": true, "message_id": "..."} |
Special cases:
- Guardrail block (DAT-263):
{"content": <bilingual refusal>}then{"done": true, "blocked": true, "category": "..."}— the LLM is never called. - Timeout (DAT-148):
{"error": "agent_timeout", "partial": true}, partial text is still persisted, the suggestion call is skipped. - Static welcome (DAT-296): meta →
{"content": <welcome>}→{"done": true, "static": true, "message_id"?}— notool/suggestionsframes.
The empty-/museum init welcome (DAT-296)¶
An empty first message on /v1/chat/museum is an initialization signal, not a question. The visitor's app opens the chat and the visitor hasn't typed anything yet, so the agent answers instantly with a fixed, personalized greeting — no LLM, no RAG, no tools, no complaint check:
welcome = welcome_message(current_user.full_name)
gen = (
persist_static_response(welcome, conversation_id) # (1)!
if created
else stream_static_text(welcome, conversation_id) # (2)!
)
schedule_welcome_push(ticket.id, current_user.id) # (3)!
- New ticket path. Appends the welcome as an assistant turn and commits before the stream closes, so the notification worker (which reads the SSE stream to its end before pushing) always pushes strictly after the chat write lands. Prevents a push-before-persist race.
- Re-init path. When the ticket already exists, re-show the welcome without appending a duplicate turn — the visitor reopened the chat, nothing new should be persisted.
-
DAT-296 off-path welcome push. POSTs
{ticket_id, user_id}to notification's/v1/ops/welcomeoff the response path. The notification side ticket-dedups it against the RDCvisit_startedwelcome, so it is safe to call on every init. Anonymous-safe: keyed on the ticket, not an email account. -
welcome_message()produces"Welcome to Dataland, {first_name}! I'm your AI guide...". The copy is kept in sync with dataland-notification'svisit_startedWELCOME_MESSAGEso the mobile-init greeting and the RDC-driven welcome read identically. - New ticket →
persist_static_response()appends the welcome as an assistant turn and commits before the stream closes, so the notification worker (which reads the SSE stream to its end before pushing) always pushes strictly after the chat write lands. - Re-init on an already-registered ticket →
stream_static_text()re-shows the welcome without appending a duplicate turn. schedule_welcome_push()POSTs{ticket_id, user_id}to notification's/v1/ops/welcomeoff the response path. The notification side ticket-dedups it against the RDCvisit_startedwelcome, so this is safe to call on every init. Anonymous visitors are first-class here:first_name_of()is anonymous-safe (""→"Welcome to Dataland!") and the push is keyed on the ticket, not an email account.
Anonymous-safe by design
Delivery is ticket_id ↔ external_id ↔ OneSignal. The welcome and welcome push never require a registered/email account — full_name and email may be empty and the flow still completes.
Tools¶
The museum agent registers five tools; the general agent registers only the first two (it has no live floor context). Tools are registered in app/agent/factory.py and implemented under app/agent/tools/. Tool results may attach sources / images to the ConversationContext, which the streamer emits as sources / images SSE frames and persists onto the assistant message.
| Tool | Mode | What it does |
|---|---|---|
get_visitor_vitals |
museum | Real-time location, chapter context, and biometrics for the active ticket |
get_room_info |
museum | Lists all chapters/artworks in a given room code |
get_scene_flow |
museum | The ordered gallery flow and where the visitor is in it (DAT-261) |
search_knowledge |
both | Hybrid RAG retrieval over the knowledge collection |
search_artwork_images |
both | Text-driven artwork image discovery (Qdrant images) |
get_visitor_vitals¶
Calls museum-api /api/tickets/{ticket_id}/vitals and renders room, chapter, scent, and biometric context. Two hardening details:
- Physiological sanity bounds (DAT-285): out-of-range readings are dropped, never relayed. Heart rate
30–220 BPM, body temperature30–43 °C, SpO₂50–100%. A bad sensor value is omitted rather than narrated. Heart rate also drives a mood label viainterpret_excitement()(<60very relaxed …≥110very excited). - Reference images: when the current chapter has reference images, they're cleared-then-set on the context so the UI shows the current chapter's cards, deduped.
get_room_info¶
Calls /api/rooms/{room_code}/chapters. Hardening:
- Image cap + dedup (DAT-279/280): at most
_MAX_ROOM_IMAGES = 3reference images, skipping any already present on the context. - Empty-room flail fix (DAT-280): if a room has no catalog entries, it returns an explicit instruction not to probe other room codes to compensate, and to describe the space generally from
get_visitor_vitals. This stopped the model from speculatively hunting across rooms for an empty transition area.
get_scene_flow (DAT-261)¶
Reads app/data/scene_flow.json (cached via lru_cache) and returns the ordered gallery experience plus, when given the visitor's current_room_code, their position ("stop 2 of 5") and the next gallery. Answers "where do I go next?", "what's the order?", "what haven't I seen yet?". The current room still comes from get_visitor_vitals at runtime; this tool only supplies the ordered map around it.
search_knowledge & search_artwork_images¶
search_knowledge proxies retrieval_service.retrieve_context() → RAG /search (collection: "knowledge", top_k=10, rerank: true) and appends each result as a source (title, uri, confidence). search_artwork_images proxies the vision service (top_k=3) and appends each hit as an image card. Both swallow upstream failures into a graceful "currently unavailable" string so a RAG blip degrades the answer rather than 500-ing the turn.
Real room names (DAT-281)¶
The wearable / RDC report bare codes (GA, GB, GC, GD, ON, LO). Visitors must never hear codes. app/agent/rooms.py is the single source of truth:
| Code | Display name |
|---|---|
ON |
Discovery Portal (entrance) |
GA |
Data Pavilion |
GB |
Latent Gallery |
GC |
Infinity Room |
GD |
The Sanctuary |
LO |
Lobby |
room_display_name() is used in all visitor-facing tool output. room_label() renders Data Pavilion (GA) so the model knows the code↔name mapping internally; the system prompt instructs it to speak only the name aloud and treat any parenthesized code as reference-only.
RAG retrieval & the 25s search timeout¶
retrieval_service keeps a pooled httpx.AsyncClient with retry (3 attempts, exponential backoff with jitter, retrying on timeout/transport errors and 5xx). The read timeout is 25 s (RAG_SEARCH_TIMEOUT_SECONDS, connect=3.0).
Why 25s, not 10s
RAG /search round-trips in roughly 10 s (query embedding + rerank). A 10 s client read timeout caused a ReadTimeout → 3 retries (~30 s) → a second search → the 60 s agent wall-clock (DAT-148) → agent_timeout, surfacing on museum-knowledge queries after the 20-section re-ingest. Raising the read timeout to 25 s lets a single search complete on the first attempt.
This sits inside two other deadlines (app/config.py): AGENT_RUN_TIMEOUT_SECONDS = 60.0 caps the whole agent.iter() loop, and AGENT_SUGGESTION_TIMEOUT_SECONDS = 15.0 caps the follow-up suggestion call. On a run timeout, partial text is persisted and the suggestion call is skipped so a stuck stream can't pin a worker indefinitely.
Suggestions (DAT-284)¶
After the main response streams, the agent makes a second, lightweight LLM call (generate_suggestions()) to produce three short follow-up questions, wrapped in its own 15 s timeout. The parser is defensive: it strips json `` fences, falls back to scanning for the first **balanced**[...]span, and tracks string/escape state so brackets inside a suggestion ("What is [Satellite Simulations]?"`) don't truncate the array (DAT-267).
The suggestions are persisted onto the assistant message and re-attached on conversation reload (DAT-284): stream_response() carries suggestions (alongside images and sources) forward from prior-run metadata into the saved message blob, and the mobile projection (message_projection.normalize_for_mobile) surfaces them — so reopening a conversation restores its follow-up chips instead of dropping them.
Silent complaint detection (DAT-213)¶
A server-side LLM judge inspects each visitor message off the response path and, on a genuine complaint, files a structured ops ticket. It is not an agent tool: it emits no SSE event and never alters the assistant's reply, so the visitor sees nothing.
flowchart LR
M[Visitor message] --> S[schedule_complaint_check]
S -. background task .-> CL[classify_complaint<br/>LLM judge, 6s cap]
CL -->|is_complaint=false| X[drop]
CL -->|is_complaint=true| RP[post_complaint]
RP -->|Bearer NOTIFICATION_OPS_TOKEN| OPS[notification /v1/ops/complaint]
OPS --> D[Discord / Slack fan-out]
- Off-path & invisible:
schedule_complaint_check()spawns the pipeline as a backgroundasynciotask that runs concurrently with the streamed response — zero added latency. Every failure mode is swallowed and logged; nothing here can break a chat turn. Strong refs to in-flight tasks are held so the loop doesn't GC them mid-run. - What fires: explicit dissatisfaction ("this is broken", "I want a refund"), a request for staff, or harassment/unsafe content. Casual venting ("ugh I'm tired"), neutral questions, and mild remarks do not fire. The classifier returns a structured
ComplaintVerdict(is_complaint,severitylow/medium/high,categorydissatisfaction/staff_request/harassment/other, one-linereason). On any error or timeout it returns a non-complaint verdict, so a classifier hiccup never produces a false alert. - Visitor identity on the ticket: the payload carries
user_name,user_email,user_id,ticket_id,session_id(== conversation_id), the classified verdict, the visitor's complaint message only (not the surrounding transcript), and a URL-encodedsession_linkbuilt fromCOMPLAINT_SESSION_LINK_BASE. - Swappable, multi-provider: the notification side owns channel selection, dedup, audit log, and fan-out to Discord/Slack (comma-separated
OPS_NOTIFIER_PROVIDER). The agent just POSTs the classified event to/v1/ops/complaintwithAuthorization: Bearer <NOTIFICATION_OPS_TOKEN>. - Multimodal: the visitor's own text is classified, not the
[IMAGE CONTEXT]-augmented prompt. - Graceful shutdown: on shutdown
drain_inflight_complaints()waits (best-effort) for in-flight checks so a complaint mid-classify still fires across a deploy.
Disabled by default (COMPLAINT_DETECTION_ENABLED=false); enable per env. Without NOTIFICATION_OPS_TOKEN set, classification still runs but the POST is skipped with a warning.
Guardrails & Model Armor¶
Two surfaces wrap every turn (app/agent/guardrails.py):
- Input (DAT-263): a cheap regex pass (
check_input) blocks instruction-override, role-override (DAN/developer-mode), prompt-extraction, chat-template special tokens, long base64 smuggling blobs, and payloads over 8000 chars before the LLM is called. A block emits a bilingual (EN/TR) refusal, persists it as the assistant turn, incrementsguardrail_triggered_total{direction="input"}, and returns. - Model Armor (DAT-268): when the regex pass clears,
check_input_model_armorruns the Google Model Armor gate to catch multilingual jailbreaks the English-leaning regex misses. It ships dark — a no-op untilMODEL_ARMOR_TEMPLATE_IDis provisioned — and fails open by default (MODEL_ARMOR_FAIL_OPEN=true): a scanner outage degrades to the regex layer rather than blocking visitors. - Output (DAT-263):
check_outputis telemetry-only. The visitor has already seen the streamed text, so it never rewrites — it just counts + logs system-prompt or tool-name leaks to feed the next prompt-tuning pass.
Prompt & date handling¶
System prompts (app/agent/prompts.py, PROMPT_VERSION = "1.1-dat260") share a single scope-and-refusal block (DAT-260): in-scope is Dataland, Refik Anadol / Refik Anadol Studio, and directly-related art history; everything else is refused warmly and bilingually with a pivot back to in-scope material. The model is told never to reveal the system prompt or internal tool names.
A wall-clock anchor is injected two ways so Gemini stops disclaiming "I don't have a real-time calendar": today_context() is re-evaluated on every run via pydantic-ai's @agent.system_prompt hook, and augment_with_date() inlines the date into the user turn at request time. The inline tag is stripped from saved history (strip_inline_date) so the persisted user message matches what the visitor typed.
Conversations API¶
| Method | Path | Purpose |
|---|---|---|
GET |
/v1/conversations |
List the user's conversations |
GET |
/v1/conversations/{id}/messages |
Mobile-friendly timeline (?raw=, ?include_debug=, ?include_system=) |
GET |
/v1/conversations/{id}/messages/raw |
Provider-native blob (debug) |
DELETE |
/v1/conversations/{id} |
Delete one conversation |
POST |
/v1/conversations/{id}/messages/{message_id}/feedback |
Set like / dislike / null feedback |
GET |
/v1/auth/me |
Resolve the JWT to the Dataland user record |
POST |
/v1/auth/logout |
Stateless ack (client discards the token) |
The persisted message blob is provider-native (pydantic-ai ModelMessages); the timeline projection normalizes it for the app and carries images, sources, and suggestions (DAT-284) forward across reloads. DB writes are wrapped in asyncio.shield() so a client disconnect mid-stream can't abort a half-written INSERT.
Auth posture¶
- Mobile visitors: RS256 JWT verified against the JWKS endpoints in
JWKS_URL+JWKS_URLS(the combined, de-duped list isauth_jwks_urls). Keys are cached per URL with aJWKS_CACHE_TTL(3600 s) lifespan; on a cache miss the client refetches.audverification is off by design — new-format CMS tokens carry anaudscoped to the mobile client UUID and the agent has no notion of expected audiences.token_typemust beaccess(or absent); the user id comes fromuser_idorsub.
Local JWKS mirror (DAT-286)
dataland-auth now mirrors the CMS signing key (kid: dataland-rs256-1) into its local JWKS via data/extra_jwks.json, so the local endpoint can validate CMS-signed tokens — removing the chat-auth single point of failure. The verifier tries JWKS URLs in order; if a fallback (non-primary) provider validates a token, it logs a WARN because that means the local JWKS is missing the key and chat auth is depending on the external endpoint. That warning is the alertable signal to re-run the JWKS mirror provisioning (e.g. after a volume wipe or CMS key rotation).
- Service-to-service:
/v1/service/*is gated byAGENT_SERVICE_TOKEN(bearer). With the token unset the surface returns503(it must not silently open). notification-worker uses it to open museum chats on a visitor's behalf, and to resolveticket_id → user / conversationviaGET /v1/service/tickets/{ticket_id}/user. The service chat endpoint validates the user actually owns the ticket before streaming, and supports both generated (message) and static (assistant_text) replies. - Admin dashboard:
/adminis gated byADMIN_PASSWORD(+ session secret). Empty disables protection (legacy local-dev). A half-configured pair refuses rather than degrading to no-auth. AUTH_SKIP: decodes without verifying signature/expiry — dev only. The production boot guard (DAT-140/159) treatsAUTH_SKIP=trueas a fatal misconfig and refuses to start.
Model & config¶
Standardized on gemini-3.5-flash everywhere (DAT-269). The chat/suggestion/complaint agents use AGENT_MODEL=google-gla:gemini-3.5-flash; Gemini captioning uses GEMINI_MODEL=gemini-3.5-flash. RAG vectors use gemini-embedding (see RAG).
APP_ENV=production # (1)!
DATABASE_URL=postgresql+asyncpg://dataland:***@dataland-postgres:5432/dataland
AGENT_MODEL=google-gla:gemini-3.5-flash # (2)!
GEMINI_MODEL=gemini-3.5-flash
GEMINI_API_KEY=***
AGENT_RUN_TIMEOUT_SECONDS=60.0 # (3)!
AGENT_SUGGESTION_TIMEOUT_SECONDS=15.0 # (4)!
RAG_BASE_URL=http://dataland-rag:4143
RAG_API_KEY=***
RAG_SEARCH_TOP_K=10
RAG_SEARCH_TIMEOUT_SECONDS=25.0 # (5)!
VISION_BASE_URL=http://dataland-rag:4143
MUSEUM_API_URL=http://dataland-museum:5001
NOTIFICATION_BASE_URL=http://dataland-notification-api:8080
REDIS_HOST=dataland-redis
REDIS_PORT=6379
REDIS_PASSWORD=*** # (6)!
REDIS_STREAM_KEY=museum:telemetry
AGENT_SERVICE_TOKEN=*** # (7)!
JWKS_URL=http://dataland-auth:9000/.well-known/jwks.json # (8)!
JWKS_URLS= # (9)!
JWKS_CACHE_TTL=3600 # (10)!
AUTH_SKIP=false # (11)!
COMPLAINT_DETECTION_ENABLED=false # (12)!
NOTIFICATION_OPS_TOKEN=*** # (13)!
COMPLAINT_CLASSIFIER_TIMEOUT_SECONDS=6.0 # (14)!
COMPLAINT_SESSION_LINK_BASE=https://dataland.chat
MODEL_ARMOR_ENABLED=true # (15)!
MODEL_ARMOR_TEMPLATE_ID= # (16)!
MODEL_ARMOR_LOCATION=us-central1
MODEL_ARMOR_FAIL_OPEN=true # (17)!
CHAT_IMAGE_MAX_BYTES=10485760 # (18)!
ADMIN_PASSWORD= # (19)!
- Setting this to
productionarms the boot guard (DAT-140/159/146) and unmounts the unauthenticated mobile-simulator router. Any other value leaves the synthetic-telemetry surface exposed, so this must beproductionon a prod deploy. - DAT-269 — model standardized on
gemini-3.5-flasheverywhere. The pydantic-aigoogle-gla:prefix selects the Google Generative Language backend;GEMINI_MODEL(no prefix) is the same model used by Gemini captioning. - Caps the whole
agent.iter()loop. On a run timeout, partial text is still persisted and the suggestion call is skipped so a stuck stream can't pin a worker indefinitely (DAT-148). - Caps the follow-up suggestion call (the second, lightweight LLM call). Independent of the main run budget so a slow suggestion can't eat into the answer.
- Raised from 10s. RAG
/searchround-trips in ~10s; a 10s read timeout caused retry storms that tripped the 60s agent wall-clock (DAT-148). 25s lets a single search finish on the first attempt.connectstays at 3.0s. - Required (DAT-76). The agent refuses to boot without a Redis password — an unauthenticated Redis on the telemetry path is a hard fail, not a warning.
- Service-to-service bearer for
/v1/service/*. With it unset that surface returns503(it must not silently open to the network). - Primary JWKS endpoint — the local dataland-auth mirror. Tried first; a fallback provider validating a token logs a
WARNbecause it means the local mirror is missing the key (DAT-286). - DAT-143 — extra issuers, comma-separated or JSON. Merged and de-duped with
JWKS_URLintoauth_jwks_urls. Empty in the common single-issuer case. - Per-URL key cache lifespan in seconds (3600 = 1h). On a cache miss the client refetches the JWKS.
- Dev only. Decodes JWTs without verifying signature or expiry. The production boot guard (DAT-140/159) treats
trueas a fatal misconfig and refuses to start. - DAT-213 — silent complaint detection is off by default; enable per env. With it on but
NOTIFICATION_OPS_TOKENunset, classification still runs but the POST is skipped with a warning. - Bearer for
/v1/ops/*on notification (both complaint tickets and the welcome push). Without it the off-path POSTs are skipped, not retried. - Hard cap on the complaint LLM judge. On timeout it returns a non-complaint verdict, so a slow classifier never produces a false alert and never adds latency to the chat turn.
- DAT-268 — Model Armor jailbreak gate. Enabled here but a no-op until
MODEL_ARMOR_TEMPLATE_IDis set (ships dark). - Empty = Model Armor is dormant. Provision a template id to arm the multilingual jailbreak scanner that catches what the English-leaning regex misses.
- Fails open by default. A Model Armor outage degrades to the regex layer rather than blocking visitors. Set
falseonly if you want a scanner outage to hard-block. - 10 MB upload ceiling (DAT-142) enforced by
process_multimodal_image()before the agent runs, alongsideCHAT_IMAGE_ALLOWED_MIMES. - Empty disables
/adminprotection (legacy local-dev). A half-configured password/session-secret pair refuses rather than degrading to no-auth.
Reaching it¶
# Public (via Cloudflare):
curl -fsS https://dataland.chat/health
# On the host:
curl -fsS http://localhost:4141/health
# Start a museum visit (instant static welcome — empty message):
curl -N https://dataland.chat/v1/chat/museum \
-H "Authorization: Bearer $JWT" -H "Content-Type: application/json" \
-d '{"message":"","ticket_id":"at5e5e24-088876441c989a2554-87cdbb91"}' # (1)!
# Ask a question in-museum:
curl -N https://dataland.chat/v1/chat/museum \
-H "Authorization: Bearer $JWT" -H "Content-Type: application/json" \
-d '{"message":"What am I looking at right now?","ticket_id":"at5e..."}' # (2)!
# Admin dashboard:
open https://dataland.chat/admin
- DAT-296 init contract. An empty
messageis the initialization signal, not a question — it returns the instant static welcome (no LLM/RAG/tools) and fires the off-path welcome push. Theticket_idis the RDC museum ticket id and doubles as theconversation_id; registration is implicit and idempotent on this first call.curl -Ndisables output buffering so the SSE deltas flush as they arrive. - Same
ticket_idresumes the same conversation — the museum chat is permanently bound to it. A non-empty message runs the full pipeline (vitals/room/knowledge tools, then suggestions). Keep-Nso you see thetext/event-streamframes live rather than buffered.
Operational notes¶
- uvicorn runs
--workers ${UVICORN_WORKERS:-2}. Prometheus client runs in multiprocess mode against a tmpfs at/tmp/prom-multiproc(PROMETHEUS_MULTIPROC_DIR) so/metricsaggregates across workers (DAT-82). - The mobile-simulator router (synthetic telemetry + arbitrary ticket claims, unauthenticated) is only mounted when
APP_ENV != production— it must never be exposed on a prod deploy. - Production boot is guarded (DAT-140/159/146): a missing/placeholder required secret, or
AUTH_SKIP=true, raises at lifespan startup and stops the worker rather than silently degrading. Runtime issues are logged in the first lines at boot and exposed atGET /health/full. - Two identity providers coexist by design — CMS / bilet.io for visitors, local
auth-serverfor staff — and the two user tables are not foreign-keyed against each other.
See also¶
- RAG — retrieval the agent's
search_knowledge/search_artwork_imagestools call into - Museum — vitals + room/chapter catalog behind
get_visitor_vitals/get_room_info - Notification — receives the off-path complaint tickets and welcome pushes
- Auth — JWKS the agent verifies mobile JWTs against (DAT-286 mirror)