Agent — AI museum guide¶

dataland-agent is the visitor-facing chat backend. It streams Gemini responses over Server-Sent Events, calls into RAG for grounded answers, and pulls live biometrics from museum-api so the guide can react to what the visitor is actually doing on the floor. The same FastAPI app also hosts a static test chat client and an admin dashboard, and its sibling process auth_server.py is the dataland-auth service.


Container	`dataland-agent`
Image	`dataland/agent:${IMAGE_TAG}` (built from `dataland-agent/Dockerfile`)
Public port	`4141` (via Cloudflare → `dataland.chat`)
Internal URL	`http://dataland-agent:4141`
Memory / CPU	`mem_limit: 1g`, `mem_reservation: 256m`, `cpus: 1.0`
Healthcheck	`GET /health` (interval 30s, timeout 10s, 3 retries, 10s start period)
Framework	FastAPI + pydantic-ai over uvicorn (`--workers ${UVICORN_WORKERS:-2}`)
Model	`google-gla:gemini-3.5-flash`

Recent changes

This page reflects the 2026-06-03 → 2026-06-04 change-set: DAT-269 (model standardized on gemini-3.5-flash), DAT-296 (empty-/museum instant welcome + welcome push, /register / /current removed), DAT-213 (silent complaint detection), DAT-284 (suggestions restored on reload), DAT-281 (real room names), DAT-279/280/285/261 (tool hardening + get_scene_flow), DAT-286 (local JWKS mirror), and the RAG /search read-timeout raise to 25s.

What it does¶

Streams chat responses to the mobile app via SSE for two surfaces: museum mode (in-museum, telemetry-aware) and general mode (outside the museum, no live vitals), each with a multimodal (image-upload) variant.
Searches artworks and museum knowledge through RAG (hybrid dense + BM25 + rerank for text; Qdrant images collection for visual search).
Fetches the visitor's current vitals (GET /api/tickets/{id}/vitals on museum-api) to ground answers in real-time location, chapter, and biometric context.
Initializes a visit instantly on an empty first /museum message with a static personalized welcome and fires the welcome push off-path (DAT-296).
Runs a silent server-side complaint judge on every visitor turn, off the response path, and files an ops ticket when it fires (DAT-213).
Receives notification-triggered chats from the worker via the internal /v1/service/chat/museum endpoint, gated by AGENT_SERVICE_TOKEN.
Persists conversation history (provider-native message blobs) and serves a mobile-friendly timeline, plus message-level like/dislike feedback.
Exposes an admin dashboard at /admin for service health, DB stats, and config inspection.

Architecture & dependencies¶

graph LR
  APP[Mobile app] -- "RS256 JWT" --> A[agent :4141]
  WK[notification-worker] -- "AGENT_SERVICE_TOKEN" --> A
  A --> R[rag :4143]
  A --> M[museum-api :5001]
  A -- "complaint / welcome push" --> N[notification-api :8080]
  A --> P[(postgres :5432)]
  A --> RD[(redis :6379)]
  A -.JWKS.-> AU[dataland-auth :9000]
  A -.JWKS.-> CMS["CMS / bilet.io JWKS"]

Upstream	Why
`dataland-rag:4143`	Hybrid text retrieval (`/search`, collection `knowledge`) + image search (vision service)
`dataland-museum:5001`	Live visitor vitals + room/chapter catalog
`dataland-notification-api:8080`	Off-path complaint tickets + welcome push (`/v1/ops/*`)
`dataland-postgres:5432`	Conversation history, users, ticket↔user mapping
`dataland-redis:6379`	Ephemeral state, active-ticket mirror reader
`dataland-auth:9000` + CMS JWKS	RS256 verification of mobile JWTs

The agent has hard depends_on with condition: service_healthy on postgres, redis, rag, and museum-api — it will not start until all four pass their healthchecks. The downstream notification calls are best-effort and never block a chat turn.

Chat endpoints¶

All chat lives under /v1 and requires a mobile RS256 JWT in Authorization: Bearer <token>. Every response is text/event-stream with Cache-Control: no-cache, Connection: keep-alive, X-Accel-Buffering: no (the last disables nginx/Cloudflare buffering so deltas flush immediately).

Method	Path	Purpose
`POST`	`/v1/chat/museum`	In-museum SSE chat bound to a `ticket_id`
`POST`	`/v1/chat/museum/multimodal`	Same, multipart with an optional `image`
`POST`	`/v1/chat/general`	Outside-museum SSE chat (no live vitals)
`POST`	`/v1/chat/general/multimodal`	Same, multipart with an optional `image`
`POST`	`/v1/service/chat/museum`	Internal, notification-triggered museum chat (service token)

Museum mode¶

POST /v1/chat/museum takes a JSON body of { "message": "...", "ticket_id": "..." }.

The ticket_id is the RDC museum ticket id for the visit and it is load-bearing: the museum conversation is permanently bound to it. register_ticket() upserts the ticket↔user mapping and guarantees a conversation row whose id == ticket_id, so the mobile client can use the ticket id directly as the conversation_id with no extra lookup. Registration is implicit and idempotent — the first message with a ticket registers it; subsequent messages resume the same chat.

/register and /current are gone (DAT-296)

The explicit POST /v1/tickets/register and /v1/tickets/current endpoints were removed. The /v1/tickets router is now empty. Registration happens implicitly on the first POST /v1/chat/museum, and conversation_id == ticket_id is the contract. Do not call a register endpoint — there isn't one.

General mode¶

POST /v1/chat/general takes { "message": "...", "conversation_id": null }. Omit conversation_id (or send null / "None" / "null" / "undefined", which the schema coerces to None) to start a new session; pass an existing id to continue. A conversation_id that doesn't belong to the caller returns 404. General mode has no ticket_id and therefore no vitals / room tools.

Multimodal¶

The multipart variants accept message (form field, may be empty), ticket_id (museum) or conversation_id (general), and an optional image file. At least one of message / image must be present (else 400). When an image is attached, process_multimodal_image() runs before the agent:

Validates the upload against CHAT_IMAGE_MAX_BYTES (10 MB default) and CHAT_IMAGE_ALLOWED_MIMES (image/jpeg,png,webp,gif,heic,heif).
Runs visual search against RAG's images collection (top_k=5).
Injects an [IMAGE CONTEXT] block of the top candidate artworks into the user message so the agent answers "what is this piece?" without re-asking.
Resolves a public URL for the upload (RAG query URL → RAG upload → public GCS fallback when GCS_USER_UPLOAD_ENABLED) and appends [USER_IMAGE]https://... for UI persistence only.

The system prompt tells the model to trust [IMAGE CONTEXT] and never read the [USER_IMAGE] line aloud.

SSE event frames¶

Every frame is a single data: {json}\n\n line. The pipeline (app/agent/streaming.py) uses pydantic-ai's agent.iter() to walk the execution graph, streaming text deltas from ModelRequestNode and tool events from CallToolsNode before each tool runs, so the client can show live status.

sequenceDiagram
  participant C as Client
  participant A as agent stream_response
  participant G as Gemini (gemini-3.5-flash)
  participant T as Tools (vitals/room/knowledge/...)
  C->>A: POST /v1/chat/museum
  A-->>C: {conversation_id, mode}
  Note over A: check_input regex → check_input_model_armor
  A->>G: agent.iter(message + inline date, history)
  G-->>A: tool call
  A-->>C: {tool, query}
  A->>T: execute tool
  T-->>A: result (+ sources/images on ctx)
  G-->>A: text deltas
  A-->>C: {content: "..."} (many)
  A-->>C: {generating_suggestions: true}
  A->>G: generate_suggestions (2nd LLM call)
  A-->>C: {sources, images}
  A-->>C: {suggestions: [...]}
  A-->>C: {done: true, message_id}

Frame shapes (see app/schemas/sse.py):

Frame	When	Example payload
meta	first, always	`{"conversation_id": "...", "mode": "museum"}`
content	per text delta	`{"content": "Welcome to the "}`
tool	before each tool runs	`{"tool": "get_visitor_vitals", "query": ""}`
generating_suggestions	after text, before suggestion call	`{"generating_suggestions": true}`
sources / images	if RAG sources or images were collected	`{"sources": [...], "images": [...]}`
suggestions	if follow-ups were generated	`{"suggestions": ["...", "...", "..."]}`
done	final	`{"done": true, "message_id": "..."}`

Special cases:

Guardrail block (DAT-263): {"content": <bilingual refusal>} then {"done": true, "blocked": true, "category": "..."} — the LLM is never called.
Timeout (DAT-148): {"error": "agent_timeout", "partial": true}, partial text is still persisted, the suggestion call is skipped.
Static welcome (DAT-296): meta → {"content": <welcome>} → {"done": true, "static": true, "message_id"?} — no tool / suggestions frames.

The empty-`/museum` init welcome (DAT-296)¶

An empty first message on /v1/chat/museum is an initialization signal, not a question. The visitor's app opens the chat and the visitor hasn't typed anything yet, so the agent answers instantly with a fixed, personalized greeting — no LLM, no RAG, no tools, no complaint check:

welcome = welcome_message(current_user.full_name)
gen = (
    persist_static_response(welcome, conversation_id)  # (1)!
    if created
    else stream_static_text(welcome, conversation_id)  # (2)!
)
schedule_welcome_push(ticket.id, current_user.id)       # (3)!

New ticket path. Appends the welcome as an assistant turn and commits before the stream closes, so the notification worker (which reads the SSE stream to its end before pushing) always pushes strictly after the chat write lands. Prevents a push-before-persist race.
Re-init path. When the ticket already exists, re-show the welcome without appending a duplicate turn — the visitor reopened the chat, nothing new should be persisted.
DAT-296 off-path welcome push. POSTs {ticket_id, user_id} to notification's /v1/ops/welcome off the response path. The notification side ticket-dedups it against the RDC visit_started welcome, so it is safe to call on every init. Anonymous-safe: keyed on the ticket, not an email account.
welcome_message() produces "Welcome to Dataland, {first_name}! I'm your AI guide...". The copy is kept in sync with dataland-notification's visit_started WELCOME_MESSAGE so the mobile-init greeting and the RDC-driven welcome read identically.
New ticket → persist_static_response() appends the welcome as an assistant turn and commits before the stream closes, so the notification worker (which reads the SSE stream to its end before pushing) always pushes strictly after the chat write lands.
Re-init on an already-registered ticket → stream_static_text() re-shows the welcome without appending a duplicate turn.
schedule_welcome_push() POSTs {ticket_id, user_id} to notification's /v1/ops/welcome off the response path. The notification side ticket-dedups it against the RDC visit_started welcome, so this is safe to call on every init. Anonymous visitors are first-class here: first_name_of() is anonymous-safe ("" → "Welcome to Dataland!") and the push is keyed on the ticket, not an email account.

Anonymous-safe by design

Delivery is ticket_id ↔ external_id ↔ OneSignal. The welcome and welcome push never require a registered/email account — full_name and email may be empty and the flow still completes.

Tools¶

The museum agent registers five tools; the general agent registers only the first two (it has no live floor context). Tools are registered in app/agent/factory.py and implemented under app/agent/tools/. Tool results may attach sources / images to the ConversationContext, which the streamer emits as sources / images SSE frames and persists onto the assistant message.

Tool	Mode	What it does
`get_visitor_vitals`	museum	Real-time location, chapter context, and biometrics for the active ticket
`get_room_info`	museum	Lists all chapters/artworks in a given room code
`get_scene_flow`	museum	The ordered gallery flow and where the visitor is in it (DAT-261)
`search_knowledge`	both	Hybrid RAG retrieval over the `knowledge` collection
`search_artwork_images`	both	Text-driven artwork image discovery (Qdrant `images`)

`get_visitor_vitals`¶

Calls museum-api /api/tickets/{ticket_id}/vitals and renders room, chapter, scent, and biometric context. Two hardening details:

Physiological sanity bounds (DAT-285): out-of-range readings are dropped, never relayed. Heart rate 30–220 BPM, body temperature 30–43 °C, SpO₂ 50–100%. A bad sensor value is omitted rather than narrated. Heart rate also drives a mood label via interpret_excitement() (<60 very relaxed … ≥110 very excited).
Reference images: when the current chapter has reference images, they're cleared-then-set on the context so the UI shows the current chapter's cards, deduped.

`get_room_info`¶

Calls /api/rooms/{room_code}/chapters. Hardening:

Image cap + dedup (DAT-279/280): at most _MAX_ROOM_IMAGES = 3 reference images, skipping any already present on the context.
Empty-room flail fix (DAT-280): if a room has no catalog entries, it returns an explicit instruction not to probe other room codes to compensate, and to describe the space generally from get_visitor_vitals. This stopped the model from speculatively hunting across rooms for an empty transition area.

`get_scene_flow` (DAT-261)¶

Reads app/data/scene_flow.json (cached via lru_cache) and returns the ordered gallery experience plus, when given the visitor's current_room_code, their position ("stop 2 of 5") and the next gallery. Answers "where do I go next?", "what's the order?", "what haven't I seen yet?". The current room still comes from get_visitor_vitals at runtime; this tool only supplies the ordered map around it.

`search_knowledge` & `search_artwork_images`¶

search_knowledge proxies retrieval_service.retrieve_context() → RAG /search (collection: "knowledge", top_k=10, rerank: true) and appends each result as a source (title, uri, confidence). search_artwork_images proxies the vision service (top_k=3) and appends each hit as an image card. Both swallow upstream failures into a graceful "currently unavailable" string so a RAG blip degrades the answer rather than 500-ing the turn.

Real room names (DAT-281)¶

The wearable / RDC report bare codes (GA, GB, GC, GD, ON, LO). Visitors must never hear codes. app/agent/rooms.py is the single source of truth:

Code	Display name
`ON`	Discovery Portal (entrance)
`GA`	Data Pavilion
`GB`	Latent Gallery
`GC`	Infinity Room
`GD`	The Sanctuary
`LO`	Lobby

room_display_name() is used in all visitor-facing tool output. room_label() renders Data Pavilion (GA) so the model knows the code↔name mapping internally; the system prompt instructs it to speak only the name aloud and treat any parenthesized code as reference-only.

RAG retrieval & the 25s search timeout¶

retrieval_service keeps a pooled httpx.AsyncClient with retry (3 attempts, exponential backoff with jitter, retrying on timeout/transport errors and 5xx). The read timeout is 25 s (RAG_SEARCH_TIMEOUT_SECONDS, connect=3.0).

Why 25s, not 10s

RAG /search round-trips in roughly 10 s (query embedding + rerank). A 10 s client read timeout caused a ReadTimeout → 3 retries (~30 s) → a second search → the 60 s agent wall-clock (DAT-148) → agent_timeout, surfacing on museum-knowledge queries after the 20-section re-ingest. Raising the read timeout to 25 s lets a single search complete on the first attempt.

This sits inside two other deadlines (app/config.py): AGENT_RUN_TIMEOUT_SECONDS = 60.0 caps the whole agent.iter() loop, and AGENT_SUGGESTION_TIMEOUT_SECONDS = 15.0 caps the follow-up suggestion call. On a run timeout, partial text is persisted and the suggestion call is skipped so a stuck stream can't pin a worker indefinitely.

Suggestions (DAT-284)¶

After the main response streams, the agent makes a second, lightweight LLM call (generate_suggestions()) to produce three short follow-up questions, wrapped in its own 15 s timeout. The parser is defensive: it strips json `` fences, falls back to scanning for the first **balanced**[...]span, and tracks string/escape state so brackets inside a suggestion ("What is [Satellite Simulations]?"`) don't truncate the array (DAT-267).

The suggestions are persisted onto the assistant message and re-attached on conversation reload (DAT-284): stream_response() carries suggestions (alongside images and sources) forward from prior-run metadata into the saved message blob, and the mobile projection (message_projection.normalize_for_mobile) surfaces them — so reopening a conversation restores its follow-up chips instead of dropping them.

Silent complaint detection (DAT-213)¶

A server-side LLM judge inspects each visitor message off the response path and, on a genuine complaint, files a structured ops ticket. It is not an agent tool: it emits no SSE event and never alters the assistant's reply, so the visitor sees nothing.

flowchart LR
  M[Visitor message] --> S[schedule_complaint_check]
  S -. background task .-> CL[classify_complaint<br/>LLM judge, 6s cap]
  CL -->|is_complaint=false| X[drop]
  CL -->|is_complaint=true| RP[post_complaint]
  RP -->|Bearer NOTIFICATION_OPS_TOKEN| OPS[notification /v1/ops/complaint]
  OPS --> D[Discord / Slack fan-out]

Off-path & invisible: schedule_complaint_check() spawns the pipeline as a background asyncio task that runs concurrently with the streamed response — zero added latency. Every failure mode is swallowed and logged; nothing here can break a chat turn. Strong refs to in-flight tasks are held so the loop doesn't GC them mid-run.
What fires: explicit dissatisfaction ("this is broken", "I want a refund"), a request for staff, or harassment/unsafe content. Casual venting ("ugh I'm tired"), neutral questions, and mild remarks do not fire. The classifier returns a structured ComplaintVerdict (is_complaint, severity low/medium/high, category dissatisfaction/staff_request/harassment/other, one-line reason). On any error or timeout it returns a non-complaint verdict, so a classifier hiccup never produces a false alert.
Visitor identity on the ticket: the payload carries user_name, user_email, user_id, ticket_id, session_id (== conversation_id), the classified verdict, the visitor's complaint message only (not the surrounding transcript), and a URL-encoded session_link built from COMPLAINT_SESSION_LINK_BASE.
Swappable, multi-provider: the notification side owns channel selection, dedup, audit log, and fan-out to Discord/Slack (comma-separated OPS_NOTIFIER_PROVIDER). The agent just POSTs the classified event to /v1/ops/complaint with Authorization: Bearer <NOTIFICATION_OPS_TOKEN>.
Multimodal: the visitor's own text is classified, not the [IMAGE CONTEXT]-augmented prompt.
Graceful shutdown: on shutdown drain_inflight_complaints() waits (best-effort) for in-flight checks so a complaint mid-classify still fires across a deploy.

Disabled by default (COMPLAINT_DETECTION_ENABLED=false); enable per env. Without NOTIFICATION_OPS_TOKEN set, classification still runs but the POST is skipped with a warning.

Guardrails & Model Armor¶

Two surfaces wrap every turn (app/agent/guardrails.py):

Input (DAT-263): a cheap regex pass (check_input) blocks instruction-override, role-override (DAN/developer-mode), prompt-extraction, chat-template special tokens, long base64 smuggling blobs, and payloads over 8000 chars before the LLM is called. A block emits a bilingual (EN/TR) refusal, persists it as the assistant turn, increments guardrail_triggered_total{direction="input"}, and returns.
Model Armor (DAT-268): when the regex pass clears, check_input_model_armor runs the Google Model Armor gate to catch multilingual jailbreaks the English-leaning regex misses. It ships dark — a no-op until MODEL_ARMOR_TEMPLATE_ID is provisioned — and fails open by default (MODEL_ARMOR_FAIL_OPEN=true): a scanner outage degrades to the regex layer rather than blocking visitors.
Output (DAT-263): check_output is telemetry-only. The visitor has already seen the streamed text, so it never rewrites — it just counts + logs system-prompt or tool-name leaks to feed the next prompt-tuning pass.

Prompt & date handling¶

System prompts (app/agent/prompts.py, PROMPT_VERSION = "1.1-dat260") share a single scope-and-refusal block (DAT-260): in-scope is Dataland, Refik Anadol / Refik Anadol Studio, and directly-related art history; everything else is refused warmly and bilingually with a pivot back to in-scope material. The model is told never to reveal the system prompt or internal tool names.

A wall-clock anchor is injected two ways so Gemini stops disclaiming "I don't have a real-time calendar": today_context() is re-evaluated on every run via pydantic-ai's @agent.system_prompt hook, and augment_with_date() inlines the date into the user turn at request time. The inline tag is stripped from saved history (strip_inline_date) so the persisted user message matches what the visitor typed.

Conversations API¶

Method	Path	Purpose
`GET`	`/v1/conversations`	List the user's conversations
`GET`	`/v1/conversations/{id}/messages`	Mobile-friendly timeline (`?raw=`, `?include_debug=`, `?include_system=`)
`GET`	`/v1/conversations/{id}/messages/raw`	Provider-native blob (debug)
`DELETE`	`/v1/conversations/{id}`	Delete one conversation
`POST`	`/v1/conversations/{id}/messages/{message_id}/feedback`	Set like / dislike / null feedback
`GET`	`/v1/auth/me`	Resolve the JWT to the Dataland user record
`POST`	`/v1/auth/logout`	Stateless ack (client discards the token)

The persisted message blob is provider-native (pydantic-ai ModelMessages); the timeline projection normalizes it for the app and carries images, sources, and suggestions (DAT-284) forward across reloads. DB writes are wrapped in asyncio.shield() so a client disconnect mid-stream can't abort a half-written INSERT.

Auth posture¶

Mobile visitors: RS256 JWT verified against the JWKS endpoints in JWKS_URL + JWKS_URLS (the combined, de-duped list is auth_jwks_urls). Keys are cached per URL with a JWKS_CACHE_TTL (3600 s) lifespan; on a cache miss the client refetches. aud verification is off by design — new-format CMS tokens carry an aud scoped to the mobile client UUID and the agent has no notion of expected audiences. token_type must be access (or absent); the user id comes from user_id or sub.

Local JWKS mirror (DAT-286)

dataland-auth now mirrors the CMS signing key (kid: dataland-rs256-1) into its local JWKS via data/extra_jwks.json, so the local endpoint can validate CMS-signed tokens — removing the chat-auth single point of failure. The verifier tries JWKS URLs in order; if a fallback (non-primary) provider validates a token, it logs a WARN because that means the local JWKS is missing the key and chat auth is depending on the external endpoint. That warning is the alertable signal to re-run the JWKS mirror provisioning (e.g. after a volume wipe or CMS key rotation).

Service-to-service: /v1/service/* is gated by AGENT_SERVICE_TOKEN (bearer). With the token unset the surface returns 503 (it must not silently open). notification-worker uses it to open museum chats on a visitor's behalf, and to resolve ticket_id → user / conversation via GET /v1/service/tickets/{ticket_id}/user. The service chat endpoint validates the user actually owns the ticket before streaming, and supports both generated (message) and static (assistant_text) replies.
Admin dashboard: /admin is gated by ADMIN_PASSWORD (+ session secret). Empty disables protection (legacy local-dev). A half-configured pair refuses rather than degrading to no-auth.
AUTH_SKIP: decodes without verifying signature/expiry — dev only. The production boot guard (DAT-140/159) treats AUTH_SKIP=true as a fatal misconfig and refuses to start.

Model & config¶

Standardized on gemini-3.5-flash everywhere (DAT-269). The chat/suggestion/complaint agents use AGENT_MODEL=google-gla:gemini-3.5-flash; Gemini captioning uses GEMINI_MODEL=gemini-3.5-flash. RAG vectors use gemini-embedding (see RAG).

APP_ENV=production                          # (1)!
DATABASE_URL=postgresql+asyncpg://dataland:***@dataland-postgres:5432/dataland
AGENT_MODEL=google-gla:gemini-3.5-flash    # (2)!
GEMINI_MODEL=gemini-3.5-flash
GEMINI_API_KEY=***
AGENT_RUN_TIMEOUT_SECONDS=60.0             # (3)!
AGENT_SUGGESTION_TIMEOUT_SECONDS=15.0      # (4)!

RAG_BASE_URL=http://dataland-rag:4143
RAG_API_KEY=***
RAG_SEARCH_TOP_K=10
RAG_SEARCH_TIMEOUT_SECONDS=25.0          # (5)!
VISION_BASE_URL=http://dataland-rag:4143
MUSEUM_API_URL=http://dataland-museum:5001
NOTIFICATION_BASE_URL=http://dataland-notification-api:8080

REDIS_HOST=dataland-redis
REDIS_PORT=6379
REDIS_PASSWORD=***                        # (6)!
REDIS_STREAM_KEY=museum:telemetry
AGENT_SERVICE_TOKEN=***                    # (7)!

JWKS_URL=http://dataland-auth:9000/.well-known/jwks.json   # (8)!
JWKS_URLS=                                 # (9)!
JWKS_CACHE_TTL=3600                        # (10)!
AUTH_SKIP=false                            # (11)!

COMPLAINT_DETECTION_ENABLED=false          # (12)!
NOTIFICATION_OPS_TOKEN=***                 # (13)!
COMPLAINT_CLASSIFIER_TIMEOUT_SECONDS=6.0   # (14)!
COMPLAINT_SESSION_LINK_BASE=https://dataland.chat

MODEL_ARMOR_ENABLED=true                   # (15)!
MODEL_ARMOR_TEMPLATE_ID=                   # (16)!
MODEL_ARMOR_LOCATION=us-central1
MODEL_ARMOR_FAIL_OPEN=true                 # (17)!

CHAT_IMAGE_MAX_BYTES=10485760              # (18)!
ADMIN_PASSWORD=                            # (19)!

Setting this to production arms the boot guard (DAT-140/159/146) and unmounts the unauthenticated mobile-simulator router. Any other value leaves the synthetic-telemetry surface exposed, so this must be production on a prod deploy.
DAT-269 — model standardized on gemini-3.5-flash everywhere. The pydantic-ai google-gla: prefix selects the Google Generative Language backend; GEMINI_MODEL (no prefix) is the same model used by Gemini captioning.
Caps the whole agent.iter() loop. On a run timeout, partial text is still persisted and the suggestion call is skipped so a stuck stream can't pin a worker indefinitely (DAT-148).
Caps the follow-up suggestion call (the second, lightweight LLM call). Independent of the main run budget so a slow suggestion can't eat into the answer.
Raised from 10s. RAG /search round-trips in ~10s; a 10s read timeout caused retry storms that tripped the 60s agent wall-clock (DAT-148). 25s lets a single search finish on the first attempt. connect stays at 3.0s.
Required (DAT-76). The agent refuses to boot without a Redis password — an unauthenticated Redis on the telemetry path is a hard fail, not a warning.
Service-to-service bearer for /v1/service/*. With it unset that surface returns 503 (it must not silently open to the network).
Primary JWKS endpoint — the local dataland-auth mirror. Tried first; a fallback provider validating a token logs a WARN because it means the local mirror is missing the key (DAT-286).
DAT-143 — extra issuers, comma-separated or JSON. Merged and de-duped with JWKS_URL into auth_jwks_urls. Empty in the common single-issuer case.
Per-URL key cache lifespan in seconds (3600 = 1h). On a cache miss the client refetches the JWKS.
Dev only. Decodes JWTs without verifying signature or expiry. The production boot guard (DAT-140/159) treats true as a fatal misconfig and refuses to start.
DAT-213 — silent complaint detection is off by default; enable per env. With it on but NOTIFICATION_OPS_TOKEN unset, classification still runs but the POST is skipped with a warning.
Bearer for /v1/ops/* on notification (both complaint tickets and the welcome push). Without it the off-path POSTs are skipped, not retried.
Hard cap on the complaint LLM judge. On timeout it returns a non-complaint verdict, so a slow classifier never produces a false alert and never adds latency to the chat turn.
DAT-268 — Model Armor jailbreak gate. Enabled here but a no-op until MODEL_ARMOR_TEMPLATE_ID is set (ships dark).
Empty = Model Armor is dormant. Provision a template id to arm the multilingual jailbreak scanner that catches what the English-leaning regex misses.
Fails open by default. A Model Armor outage degrades to the regex layer rather than blocking visitors. Set false only if you want a scanner outage to hard-block.
10 MB upload ceiling (DAT-142) enforced by process_multimodal_image() before the agent runs, alongside CHAT_IMAGE_ALLOWED_MIMES.
Empty disables /admin protection (legacy local-dev). A half-configured password/session-secret pair refuses rather than degrading to no-auth.

Reaching it¶

# Public (via Cloudflare):
curl -fsS https://dataland.chat/health

# On the host:
curl -fsS http://localhost:4141/health

# Start a museum visit (instant static welcome — empty message):
curl -N https://dataland.chat/v1/chat/museum \
  -H "Authorization: Bearer $JWT" -H "Content-Type: application/json" \
  -d '{"message":"","ticket_id":"at5e5e24-088876441c989a2554-87cdbb91"}'  # (1)!

# Ask a question in-museum:
curl -N https://dataland.chat/v1/chat/museum \
  -H "Authorization: Bearer $JWT" -H "Content-Type: application/json" \
  -d '{"message":"What am I looking at right now?","ticket_id":"at5e..."}'  # (2)!

# Admin dashboard:
open https://dataland.chat/admin

DAT-296 init contract. An empty message is the initialization signal, not a question — it returns the instant static welcome (no LLM/RAG/tools) and fires the off-path welcome push. The ticket_id is the RDC museum ticket id and doubles as the conversation_id; registration is implicit and idempotent on this first call. curl -N disables output buffering so the SSE deltas flush as they arrive.
Same ticket_id resumes the same conversation — the museum chat is permanently bound to it. A non-empty message runs the full pipeline (vitals/room/knowledge tools, then suggestions). Keep -N so you see the text/event-stream frames live rather than buffered.

Operational notes¶

uvicorn runs --workers ${UVICORN_WORKERS:-2}. Prometheus client runs in multiprocess mode against a tmpfs at /tmp/prom-multiproc (PROMETHEUS_MULTIPROC_DIR) so /metrics aggregates across workers (DAT-82).
The mobile-simulator router (synthetic telemetry + arbitrary ticket claims, unauthenticated) is only mounted when APP_ENV != production — it must never be exposed on a prod deploy.
Production boot is guarded (DAT-140/159/146): a missing/placeholder required secret, or AUTH_SKIP=true, raises at lifespan startup and stops the worker rather than silently degrading. Runtime issues are logged in the first lines at boot and exposed at GET /health/full.
Two identity providers coexist by design — CMS / bilet.io for visitors, local auth-server for staff — and the two user tables are not foreign-keyed against each other.