forked from coracle/caravel
Add backend skill
This commit is contained in:
@@ -0,0 +1,128 @@
|
||||
---
|
||||
name: backend
|
||||
description: Architecture and conventions for the Caravel backend — an Axum + SQLite (sqlx) Rust service (edition 2024) with two long-running reactors. Covers the flat module map and where new code goes; the free-function query/command data layer (no repository objects) over a OnceLock global pool; the commit-then-publish activity-broadcast model the relay-sync and billing reactors hang off; auth that is structural (the AuthedPubkey NIP-98 extractor authenticates, but each handler must call require_admin/require_tenant itself — there is NO router-level authz, so a forgotten check fails OPEN); the web.rs response envelope; the env OnceLock singleton (every var required, panics at boot); the leaf integration wrappers (Stripe/NWC/Coinbase/Nostr/zooid) that billing.rs is the primary orchestrator for (though route handlers also call Stripe/Robot directly); and the clippy+build verification gate (prefer the backend-only just recipes over `just check`, which also compiles the frontend). Use this whenever working anywhere in backend/ — adding an endpoint, query, write, model, migration, config var, integration, or reactor — to follow house conventions and avoid fail-open auth, double-billing, and publish-before-commit traps.
|
||||
---
|
||||
|
||||
# Caravel backend
|
||||
|
||||
This is the map of the Caravel backend: an Axum HTTP service plus two long-running reactors, persisting to SQLite via sqlx (the crate is edition 2024). It explains how the backend is organized and *why*, and points you at the modules for the *how* — reach for this for orientation and conventions, and reach for the modules it names for implementation detail. The deep, lookup-style material lives in `references/`.
|
||||
|
||||
One warning up front, because it is the single most dangerous wrong assumption you can carry into this codebase: **there is no router-level authorization.** Adding the `AuthedPubkey` extractor to a handler only proves *identity* — that the caller signed a valid NIP-98 event. If a handler then forgets to call a `require_*` helper, it is authenticated-but-open to *any* signed-in pubkey. Auth here fails **open**, not closed, so authorization is each handler's own explicit responsibility (`api.rs:13-15,101-137`).
|
||||
|
||||
Two more silent traps the body expands on, named here so you carry them in: `publish()` must happen **after** the transaction commits (never inside `with_tx`), or the reactors observe rows that might roll back; and double-billing is prevented by atomic guards rather than naive read-then-write — per-activity charges use a conditional `UPDATE ... WHERE billed_at IS NULL` claim (checking `rows_affected`) backed by a `UNIQUE` index on `invoice_item.activity_id`, while monthly renewals (whose items have `activity_id = NULL`) use a transaction-scoped read-then-write guard on the tenant's `renewed_at` marker re-read inside the same transaction.
|
||||
|
||||
## It's free functions and a global pool, not services/repositories
|
||||
|
||||
Internalize the actual shape before reaching for defaults, because the wrong default here (a `Service` or `Repository` holding a connection handle) is exactly what an agent reaches for:
|
||||
|
||||
- **There are no service/repository objects holding a pool or connection.** Data access is two modules of free functions — `query.rs` (all reads/SELECTs) and `command.rs` (all writes). Reads call `db::pool()` directly; writes either call `pool()` directly for single-statement updates or, for anything multi-step, run through the `db::with_tx` helper and operate on a `&mut Transaction` — instead of threading a handle through call sites (`query.rs:58-262`, `command.rs:14-705`, `db.rs:38-40,56-68`).
|
||||
- **The pool and the activity broadcast channel are process-wide globals** in `OnceLock`s (`POOL`, `NOTIFY`), set once by `db::init()` at startup; reading before init or setting twice panics. This is deliberate — it is what lets `query`/`command` stay free functions instead of carrying a handle (`db.rs:15-54`).
|
||||
- **The shared, application-scoped service container is `Api`** (it holds `billing`, `stripe`, `robot`). It is constructed once, wrapped in `Arc` in `Api::router()`, and installed as axum router state; handlers receive a cheaply-cloned reference as `State<Arc<Api>>` (the per-request cost is just a refcount bump, not a new instance). It is a thin authorization-and-orchestration surface, not a data handle (`api.rs:50-99`).
|
||||
- **The crate is edition 2024**, which is required for the let-chains (`&&`-joined `let` patterns) in `infra.rs`. The `async |tx| { ... }` closures the `with_tx` callers use are *not* edition-gated — they were stabilized in Rust 1.85 and only need a recent toolchain, not edition 2024 specifically (`Cargo.toml:2-4`).
|
||||
|
||||
## The module map and where new code goes
|
||||
|
||||
The module map is flat under `backend/src` — one job per module: `api`, `billing`, `bitcoin`, `command`, `db`, `env`, `infra`, `models`, `query`, `robot`, `routes`, `stripe`, `wallet`, `web`. `backend` is a dual library+binary crate, so this same set of modules is declared in two roots: `lib.rs` declares them as `pub mod` (the library root, the public/canonical declaration) and `main.rs` re-declares them as private `mod` for the binary entry point (`lib.rs:1-14`, `main.rs:1-14`).
|
||||
|
||||
The layering, so you know the call direction: a route handler performs authorization via `Api` helpers (`require_admin` / `require_admin_or_tenant` / `require_tenant`) when needed, then calls `query` (reads) / `command` (writes) / `billing` (orchestration), which call `db`. The integration leaves (`stripe`/`wallet`/`bitcoin`/`robot`) are composed in two places: `billing.rs` holds its own `stripe`/`wallet`/`robot` for the reconciliation loop, while `Api` holds `stripe` and `robot` that route handlers invoke *directly* (e.g. `create_tenant` calls `api.robot.fetch_nostr_name` and `api.stripe.create_customer`; `create_stripe_session` calls `api.stripe.create_portal_session`) — so the leaves are not composed exclusively by billing (`routes/tenants.rs:76-94,263-280`, `billing.rs:25-33`).
|
||||
|
||||
Where a *new* thing goes:
|
||||
|
||||
- **An endpoint** → a handler fn in the matching `routes/*.rs` **and** a `.route(...)` line in `Api::router()`. Both files are required; "I added a handler but the route 404s" is the number-one gotcha here because the two live in different files (`api.rs:66-99`).
|
||||
- **A read** → a free async fn in `query.rs`.
|
||||
- **A write** → a free fn in `command.rs` (a single-statement write runs directly on `pool()`; a multi-step write that must be atomic is composed inside `db::with_tx` and publishes its `Activity` after commit).
|
||||
- **A model or field** → `models.rs`, plus a numbered migration under `migrations/` (pre-release the change is squashed into the current `0001_init.sql` rather than appended).
|
||||
- **A config var** → `env.rs`.
|
||||
- **A third-party call** → the matching leaf module (`stripe`/`wallet`/`bitcoin`/`robot`, or the zooid sync in `infra.rs`).
|
||||
|
||||
The full per-module responsibility table, the exact `main()` bootstrap order, and the `lib.rs`/tests note are in [references/module-map-and-layering.md](references/module-map-and-layering.md).
|
||||
|
||||
## Request lifecycle: authenticate structurally, authorize explicitly, return an envelope
|
||||
|
||||
**Authentication is structural, and there is no middleware.** Adding the `AuthedPubkey(auth)` param to a handler *is* the entire auth mechanism — it is a NIP-98 `FromRequestParts` extractor, and its mere presence makes the route require a signed-in caller. Omitting it makes the route public; the public routes (`GET /plans`, `GET /plans/:id`) simply omit it (`api.rs:206-223`, `routes/plans.rs:9-16`).
|
||||
|
||||
**Authorization is the handler's explicit job** via `Api` helpers: `require_admin` / `require_tenant` / `require_admin_or_tenant` (403 on failure) and `get_tenant_or_404` / `get_relay_or_404` (load-or-404). Restating the fail-open *why*: identity is not permission, and the router gates nothing, so a handler that authenticates but never authorizes is open to any signed-in pubkey (`api.rs:103-153`).
|
||||
|
||||
The handler shape is fixed: params ordered `State<Arc<Api>>` → `AuthedPubkey(auth)` → `Path`/`Query`/`Json`; the body returns `web::ApiResult`; wrap infra/db/external errors with `.map_err(internal)?`; let `require_*`/`get_*_or_404` propagate with a bare `?`; tail with `ok(..)`/`created(..)` (`routes/tenants.rs:61-71,121-141`).
|
||||
|
||||
One ordering rule with a security reason. For a path-by-id resource owned by a tenant (a relay, an invoice), **fetch first, then authorize** against the loaded resource's `tenant_pubkey` — you need the row to know whose it is, and this intentionally returns 403 (not 404) to a non-owner of an *existing* resource. For tenant routes keyed by the tenant's own pubkey, the `Path` *is* the `tenant_pubkey`, so authorize on it first (`routes/relays.rs:29-37`, `routes/invoices.rs:19-32`, `routes/tenants.rs:61-71`).
|
||||
|
||||
The response envelope: success goes through `web::ok`/`created`/`res`, returning `{ data, code: "ok" }`; errors go through typed builders returning `{ error, code }`. Note the keys differ — `data` on success, `error` on failure. `unauthorized`/`forbidden`/`not_found`/`internal` hardcode their code; `bad_request`/`unprocessable` take a caller-supplied kebab-case domain code. Translate sqlite `UNIQUE` violations to 422 via `map_unique_error` rather than letting them 500 (`web.rs:31-129`, `routes/relays.rs:309-316`).
|
||||
|
||||
Flag one deliberate weakness so nobody "fixes" it: the NIP-98 check here is a **session-style variant**. It verifies kind 27235, the signature, and that the last `u` tag equals `SERVER_URL`, but it does **not** bind HTTP method/URL/query, payload hash, timestamp freshness, or keep a replay cache — a valid header is effectively a ~10-minute bearer token. This is intentional (fewer signing prompts); do not add per-request binding (`api.rs:157-203`, `README.md:128-137`).
|
||||
|
||||
The exact decode steps, the envelope field shapes, and the full in-use domain-error-code list are in [references/request-lifecycle-and-web.md](references/request-lifecycle-and-web.md).
|
||||
|
||||
## The data layer: query/command split, transactions, and the activity log
|
||||
|
||||
Reads live in `query.rs` (mostly free async fns over `db::pool()`; `list_plans`/`get_plan` are synchronous). Writes live in `command.rs`: simple single-row writes are free async fns over `db::pool()`, but multi-step writes run inside `with_tx()` and delegate to private `_tx` helpers taking `&mut Transaction`. Tenant-scoped reads take a `tenant_pubkey` param and filter on it; some are suffixed `_for_tenant` (`list_relays_for_tenant`, `list_invoices_for_tenant`) but several are not (`list_open_invoices`, `list_unbilled_invoice_items`, `list_billable_activity`), so the suffix is not a reliable marker of tenant scoping (`query.rs:89-96,165-218`, `command.rs:14-87`).
|
||||
|
||||
`with_tx` is the only transaction primitive: it runs an async closure with a `&mut Transaction`, commits on `Ok`, and rolls back **only** via `Transaction`'s `Drop` on `Err` — there is no explicit rollback. The consequence to respect: a closure that swallows an error and returns `Ok` will commit a partial write. Multi-step atomic writes compose private `*_tx` helpers (each taking `&mut Transaction` as its first param) inside one `with_tx` closure (`db.rs:60-68`, `command.rs:466-704`).
|
||||
|
||||
The core idiom is the **activity log plus commit-then-publish**: a mutation records an `Activity` row inside the transaction, the `*_tx` helper *returns* that `Activity`, and the public command calls `db::publish(activity)` **after** `with_tx` commits — so reactors only ever observe durable rows. Publishing inside the transaction is the trap, because a subscriber could then act on a row that rolls back (`command.rs:179-182`, `db.rs:47-54`).
|
||||
|
||||
**Idempotency and double-billing are prevented by atomic guards, not naive read-then-write checks,** and the guard differs by path. Per-activity charges use a conditional claim: `mark_activity_billed_tx` updates `WHERE billed_at IS NULL` and returns a bool (`rows_affected() > 0`) you must honor, backstopped by `UNIQUE(invoice_item.activity_id)`. Other monotonic flips guard on null markers: `mark_invoice_paid_tx` only flips while `paid_at IS NULL AND voided_at IS NULL`. Renewals are the exception — their line items have `activity_id = NULL`, so neither the WHERE-guard nor the UNIQUE index protects them; their sole protection is a transaction-scoped read-then-write that re-reads `renewed_at` inside the same `with_tx` and only writes if the period hasn't been renewed (`command.rs:279-335,563-630`, `migrations/0001_init.sql:111-112`).
|
||||
|
||||
Billing-lifecycle entities model state as **nullable timestamp markers, not status enums**: an invoice is open while `paid_at` and `voided_at` are both null, a tenant is churned once `churned_at` is set, a bolt11 is settled once `settled_at` is set. Filter on the timestamps; these billing tables have no status column. Relay status is the one exception — a free-form `TEXT` column (active/inactive/delinquent) with no `CHECK` and no Rust enum, guarded only by the `RELAY_STATUS_*` consts and filtered/branched on throughout the relay code (`models.rs:4-6,54-60,120-142`, `migrations/0001_init.sql:32,48-60`).
|
||||
|
||||
Two cross-cutting gotchas worth stating inline: boolean-ish columns are stored and typed as `i64` 0/1 (`policy_public_join`, the `*_enabled` flags, `synced`), not Rust `bool` — compare against 0/1; and plans are **not** a DB table (`list_plans`/`get_plan` are hardcoded, synchronous in-memory data), so adding a plan is a code edit, not a migration (`models.rs:86-94`, `query.rs:20-54`).
|
||||
|
||||
The per-table read helpers, the `Snapshot` enum, the strict-`<` historical lookups, the schema-squash migration rule, and the FK naming convention are in [references/data-layer-and-schema.md](references/data-layer-and-schema.md).
|
||||
|
||||
## Background reactors and the broadcast model
|
||||
|
||||
Two detached tokio tasks are launched from `main()` after `db::init()` and run for the life of the process alongside the axum server: `billing.start()` (time-driven) and `infra::start()` (event-driven) (`main.rs:56-67`).
|
||||
|
||||
**Billing is the hourly poller.** A tokio interval loop calls `reconcile_subscriptions()`, which sweeps all tenants and logs-and-continues per tenant so one failure never aborts the sweep. The same `reconcile_subscription(tenant, attempt_payment)` is shared by the worker (`true`) and the synchronous reconcile route (`false`) — parameterize shared reconciliation rather than duplicating it (`billing.rs:46-130`).
|
||||
|
||||
**Infra is the broadcast reactor, and the model for any new background reaction.** It calls `db::subscribe()` to the activity channel, runs a reconcile sweep on *startup* to recover work missed while the process was down, then loops on `recv()`. It must handle `RecvError::Lagged` by running a full reconcile sweep over the DB "pending" query — the channel is best-effort with capacity 64, so you cannot assume you saw every message; `Closed` ends the worker (`infra.rs:21-44`).
|
||||
|
||||
The two non-negotiable reactor rules, with their *why*:
|
||||
|
||||
- **The top-level reactor driver loops never crash the process on a failure.** The billing poll loop, the per-tenant sweep, and the infra `recv` loop each wrap their unit of work in a `tracing::error!`-logged guard with structured fields and continue (`sync_relay` is even infallible by design). Note this catch-and-continue is only at the driver level: inner batch loops (e.g. the per-activity loop, `reconcile_renewal`, `reconcile_relay_state`) propagate a per-item error via `?` and abandon the rest of the current batch — that error still bubbles up to the nearest wrapped driver, so the process stays alive, but the failing item aborts its enclosing batch rather than being skipped (`billing.rs:52-54,66-74,104-110`, `infra.rs:28-44,83-89`).
|
||||
- **Correctness comes from the DB reconcile sweep, never from one-message-per-event.** `db::publish` silently drops when there are no subscribers, and the bounded channel drops on lag — the broadcast is a hint, the DB "pending" query is the source of truth (`db.rs:50-54`, `infra.rs:35-41`).
|
||||
|
||||
New background reactions should hook the publish/subscribe activity stream rather than adding a new poller; tuning knobs (poll interval, grace/DM windows, retry base/max/attempts) live as module-level consts at the top of the worker file (`db.rs:43-54`, `billing.rs:15-23`, `infra.rs:15-17`).
|
||||
|
||||
The relay-sync retry/backoff mechanics, the self-feeding `fail_relay_sync` loop, the POST-vs-PATCH `is_new` logic, the secret-never-stored detail, and the full billing dunning cascade are in [references/reactors-and-relay-sync.md](references/reactors-and-relay-sync.md).
|
||||
|
||||
## External integrations: Nostr, Stripe, Lightning/NWC, Coinbase, zooid
|
||||
|
||||
Every integration is a **leaf I/O wrapper** that speaks only to the third party, returns `anyhow::Result`, and knows nothing about the DB, routes, or domain (`stripe.rs`'s only `crate` import is `env`, for the API key). `stripe.rs` parses Stripe's JSON internally via a private `send_json -> Result<serde_json::Value>` helper, but its public methods hand back small typed results (e.g. `Result<String>`, `Result<Option<String>>`), not raw `serde_json::Value`. New external calls go in the matching leaf module (`stripe`/`wallet`/`bitcoin`/`robot`, or the zooid sync in `infra.rs`), never inline in a route or mixed with DB logic (`stripe.rs:1-5`, `wallet.rs:7-13`).
|
||||
|
||||
`billing.rs` is the **primary orchestrator** that composes integrations against the DB — notably the payment cascade (NWC auto-pay → out-of-band lightning check → Stripe card on file → manual DM), where a failing NWC or Stripe attempt records its error on the tenant but never aborts the cascade and the first success returns early. It is *not* the only place integrations are used: route handlers also call Stripe and the robot directly (e.g. `create_tenant` calls `api.robot.fetch_nostr_name` and `api.stripe.create_customer`; `create_stripe_session` calls `api.stripe.create_portal_session`). And a handler may invoke more than one billing method — `reconcile_tenant` calls both `sync_stripe_customer` and `reconcile_subscription`, `reconcile_invoice` calls both `ensure_bolt11_for_invoice` and `attempt_payment` — and those public billing methods are themselves orchestrators that fan out internally (`billing.rs:29-33,326-377`, `routes/tenants.rs:84-94,182-190`).
|
||||
|
||||
Sensitive at-rest values (a tenant's `nwc_url`) are NIP-44 **self-encrypted** with the robot's own keypair via `env.encrypt`/`decrypt` — at-rest confidentiality for the service, *not* a DM to the tenant — encrypted at the write boundary (the route) and decrypted only at point of use (billing). Outbound zooid calls are NIP-98 signed via `env.make_auth` (`env.rs:86-107`, `routes/tenants.rs:130-137`).
|
||||
|
||||
Two design intentions not to "fix": an off-session Stripe `PaymentIntent` is treated as **failed unless `status == "succeeded"`**, so the cascade falls through via two distinct paths — an off-session 3DS/authentication demand returns an HTTP 402 error caught earlier by `error_for_status`, while a 2xx response whose status is merely not `"succeeded"` is caught by the explicit status check (do not assume 3DS "comes back 2xx" — for off-session confirmed intents Stripe surfaces it as an HTTP error); and the zooid relay secret is generated fresh and sent **only on first sync** (`is_new`), so Caravel never stores relay secrets, which is why a re-sync must `PATCH`, not `POST` (`stripe.rs:104-106,135-143,194-227`, `infra.rs:168-243`).
|
||||
|
||||
The Stripe idempotency-key HMAC scheme, the currency-minor exponent table, Robot's publish-on-construct side effect, the relay-list cache TTL, and the per-integration error-string conventions are in [references/integrations.md](references/integrations.md).
|
||||
|
||||
## Config: the env singleton
|
||||
|
||||
All config is one process-wide `Env` struct in a `static OnceLock`, loaded once by `env::init()` in `main()` immediately after `dotenv`, *before* `db::init()` — the env → db → services order is load-bearing. Read config **only** through `crate::env::get()` (which returns `&'static Env`); never read `std::env::var` outside `env.rs` (`env.rs:8-20`, `main.rs:28-37`).
|
||||
|
||||
**Every variable is required** — there are no optional vars and no graceful degradation. `require_str`/`require_u16`/`require_csv` panic at boot on a missing, blank, or invalid value (and an invalid `ROBOT_SECRET` panics too). Adding an integration var without setting it crashes the process on boot rather than degrading (`env.rs:110-140`).
|
||||
|
||||
Adding a config var is **four coordinated edits**: a field on `Env`, a load line in `Env::load` with the right `require_*` helper, README docs, and `.env.template`. Do crypto/auth through `Env` methods (`encrypt`/`decrypt`, `make_auth`), not by reaching for the keys ad hoc (`env.rs:22-84`).
|
||||
|
||||
Two traps. NIP-98 host-affinity means `SERVER_URL` must *exactly* equal the client's `u` tag or every authenticated request 401s. And the README uses stale var names that don't exist in `env.rs`: its local-dev table lists `ADMINS` (real name `SERVER_ADMIN_PUBKEYS`) and `ZOOID_API_SECRET` (the backend has no such var — it consumes `ZOOID_API_URL` and signs zooid requests with `ROBOT_SECRET`), while the production `docker run` example sets `PLATFORM_NAME` (a frontend VITE var, not a backend `Env` field) — trust `env.rs` and `.env.template`, not the README (`api.rs:158-202`, `README.md:19,101-102` vs `env.rs:60,76`).
|
||||
|
||||
The full variable surface, the `DATABASE_URL`/`CARGO_MANIFEST_DIR` rewrite, and the CORS silent-drop are in [references/config-and-env.md](references/config-and-env.md).
|
||||
|
||||
## Building and verifying a change
|
||||
|
||||
The `justfile` is the canonical task runner; backend recipes `cd` into `backend/` and run one cargo command. The minimal diff-safe gate for a backend edit is `just build-backend` (`cargo build`) plus `just lint-backend` (`cargo clippy -- -D warnings`, where every warning is a hard error), plus `just test-backend` if the touched area has tests (`justfile:17-30`).
|
||||
|
||||
**For a backend-only change prefer the backend-scoped recipes (`just fmt-backend lint-backend build-backend test-backend`) over a full `just check`.** `just check` also runs against the frontend — `build` is `build-backend build-frontend` — so it compiles the frontend even for a backend-only edit. The backend crate is currently both fmt-clean (`cargo fmt --check` exits 0) and clippy-clean (`cargo clippy -- -D warnings` exits 0), so running fmt is fine; verify fmt state with `cargo fmt --check` rather than assuming drift (`justfile:39,43`).
|
||||
|
||||
There are currently **zero tests** in the backend — no `#[cfg(test)]`/`#[tokio::test]` under `backend/src`, no `tests/` dir — so `cargo test` and `cargo test api::tests::` both pass trivially with 0 tests run. A green `cargo test` does *not* mean your change is exercised. The scaffolding exists (the `tower`/`util` dev-dep for `ServiceExt::oneshot`, the `api` module path the `test-backend-api` filter expects), so new behavior should add tests under `api::tests::` and drive the `Router` via tower's `oneshot` (`justfile:26-27`, `Cargo.toml:29-30`).
|
||||
|
||||
`lint-backend` runs `cargo clippy -- -D warnings`, where every warning is a hard error; the crate currently lints clean, so keep your additions warning-free rather than churning unrelated code to silence nits (`justfile:20-21`).
|
||||
|
||||
## House style (brief)
|
||||
|
||||
- **Comments are minimal**, one line where possible; a doc comment states a function's *purpose*, not its implementation. There is one canonical place for any fact — model/field semantics in `models.rs` doc comments only, DB index rationale in migration SQL comments — so don't duplicate across layers (root `AGENTS.md:3-12`).
|
||||
- **Naming.** FK columns are `{model}_{pk}` (`relay.tenant_pubkey`, and in `models.rs` `invoice_item.activity_id`, `bolt11.invoice_id`); a tenant's pubkey is `tenant_pubkey` except in already-tenant-scoped contexts like `tenant.pubkey` or `get_tenant(pubkey)` (root `AGENTS.md:16,18`). Separately, some tenant-scoped query/command fns are suffixed `_for_tenant` (a codebase convention in `query.rs`/`command.rs`, not stated in `AGENTS.md`; see the data-layer reference for why it is not a reliable marker).
|
||||
- **Rust idioms.** Prefer `&str` over `&String` params; avoid passing `&mut` into functions — return results and let the caller manage mutability; resist over-DRY — extract only for a distinct concern, 3+ repetitions, or genuine clarity (the inline zooid body and the longhand `update_relay` merge are deliberate) (root `AGENTS.md:30,32,34`).
|
||||
- **Markdown.** Do not hard-wrap at a fixed column — write one logical line per paragraph (root `AGENTS.md:24-26`).
|
||||
@@ -0,0 +1,59 @@
|
||||
# Config and the env singleton (deep detail)
|
||||
|
||||
This is the lookup-depth companion to the SKILL.md "config" section: the full variable surface, the `require_*` helpers, the `DATABASE_URL` rewrite, and the stale-README traps. All of it lives in `env.rs`, `db.rs`, `main.rs`, `api.rs`, `.env.template`, and the README.
|
||||
|
||||
## The variable surface
|
||||
|
||||
`Env` is `#[derive(Clone)]` with 24 fields plus a parsed `keys: Keys`. The full grouped surface:
|
||||
|
||||
- **server** — `SERVER_URL`, `SERVER_PORT` (u16), `SERVER_ADMIN_PUBKEYS` (csv), `SERVER_ALLOW_ORIGINS` (csv), `APP_URL`, `DATABASE_URL`
|
||||
- **robot identity** — `ROBOT_SECRET` (parsed into `Keys`), `ROBOT_NAME`, `ROBOT_DESCRIPTION`, `ROBOT_PICTURE`, `ROBOT_WALLET`, `ROBOT_OUTBOX_RELAYS`, `ROBOT_INDEXER_RELAYS`, `ROBOT_MESSAGING_RELAYS`
|
||||
- **zooid/livekit** — `ZOOID_API_URL`, `RELAY_DOMAIN`, `LIVEKIT_URL`, `LIVEKIT_API_KEY`, `LIVEKIT_API_SECRET`
|
||||
- **blossom S3** — `BLOSSOM_S3_ENDPOINT`, `BLOSSOM_S3_REGION`, `BLOSSOM_S3_BUCKET`, `BLOSSOM_S3_ACCESS_KEY`, `BLOSSOM_S3_SECRET_KEY`
|
||||
- **billing** — `STRIPE_SECRET_KEY`
|
||||
|
||||
Source: `env.rs:22-82`, `.env.template:1-38`.
|
||||
|
||||
## The require_* helpers
|
||||
|
||||
- `require_str` reads the var, panics `"{key} is required"` if unset, trims it, and panics again if the trimmed value is empty.
|
||||
- `require_u16` calls `require_str` then `.parse()`, panicking `"{key} is invalid"` on a parse failure.
|
||||
- `require_csv` uses `std::env::var(key).unwrap_or_default()`, splits on `,`, trims each element, drops empties, and panics `"{key} is required"` if the result is empty — so an *unset* csv var and a *present-but-all-blank* one both say "required", which can mislead debugging.
|
||||
|
||||
`ROBOT_SECRET` is parsed via `Keys::parse(...).expect(...)`, so an invalid key panics too. Pick the helper by type: scalars use `require_str`, the port uses `require_u16`, lists use `require_csv`. Source: `env.rs:53-55,110-140`.
|
||||
|
||||
## Field semantics worth noting
|
||||
|
||||
- `SERVER_URL` is the NIP-98 host-affinity value — the `u` tag must equal it exactly or every authenticated request 401s.
|
||||
- `SERVER_PORT` binds `127.0.0.1` only (localhost, not `0.0.0.0`).
|
||||
- `SERVER_ADMIN_PUBKEYS` is checked by membership for `is_admin`.
|
||||
- `SERVER_ALLOW_ORIGINS` is parsed into `CorsLayer` `HeaderValue`s, with unparseable origins **silently dropped** via `filter_map(...ok())` — a typo won't error at startup, it just won't allow that origin and surfaces later as a browser CORS block.
|
||||
- `APP_URL` is the only var trailing-slash-trimmed at load (`zooid_api_url` is trimmed at use).
|
||||
|
||||
Source: `api.rs:104,158-202`, `main.rs:44-65`, `env.rs:62`.
|
||||
|
||||
## DATABASE_URL normalization
|
||||
|
||||
`normalize_sqlite_url` rewrites a relative `sqlite://<rel>` to `sqlite://{CARGO_MANIFEST_DIR}/<rel>` — and `CARGO_MANIFEST_DIR` is baked in at **compile** time via `env!`, so the path resolves relative to the build-time backend crate dir, not the process cwd. `:memory:`, absolute, and non-sqlite URLs pass through unchanged; the parent dir is `create_dir_all`'d; connection uses `create_if_missing` plus WAL plus `./migrations`. In the Docker image the backend is compiled with `WORKDIR /app`, so `CARGO_MANIFEST_DIR` = `/app`; the documented **relative** `DATABASE_URL=sqlite://data/caravel.db` therefore resolves to `/app/data/caravel.db`, lining up with the `-v my-caravel-data:/app/data` volume mount. The resolution comes from the build-time crate dir, not the runtime working directory, and the deployment URL is relative (not absolute). Source: `db.rs:70-108`, `Dockerfile:6`, `README.md:18,23`.
|
||||
|
||||
## Adding a config var, and doing crypto through Env
|
||||
|
||||
Adding a var is four coordinated edits: an `Env` field, a load line in `Env::load` with the right `require_*` helper, README docs, and `.env.template`. Do crypto/auth through `Env` methods (`encrypt`/`decrypt`, `make_auth`), not by reaching for `keys` ad hoc. Normalize URL-shaped config at the edge the way existing code does. Source: `env.rs:22-107`.
|
||||
|
||||
## Stale-README traps to ignore
|
||||
|
||||
- The README's local-dev table uses `ADMINS` (the real var is `SERVER_ADMIN_PUBKEYS`) and `ZOOID_API_SECRET` (no such var — zooid auth is `ROBOT_SECRET` via NIP-98).
|
||||
- The README docker example sets `PLATFORM_NAME`, which is a frontend VITE var, not a backend `Env` field.
|
||||
- Trust `env.rs` and `.env.template`, not the README.
|
||||
- `.env` is gitignored (root `.env` and `**/.env`), as are `data/` and `target/`; there is no backend-level `.gitignore`.
|
||||
|
||||
Source: `README.md` vs `env.rs:54,60,76`, `.gitignore:5-8`.
|
||||
|
||||
## Sources
|
||||
|
||||
- variable surface — `backend/src/env.rs:22-82`, `backend/.env.template:1-38`
|
||||
- require_* helpers — `backend/src/env.rs:53-55,110-140`
|
||||
- field semantics — `backend/src/api.rs:104,158-202`, `backend/src/main.rs:44-65`, `backend/src/env.rs:62`
|
||||
- DATABASE_URL normalization — `backend/src/db.rs:70-108`
|
||||
- adding a var + crypto via Env — `backend/src/env.rs:22-107`
|
||||
- stale-README traps — `README.md` vs `backend/src/env.rs:54,60,76`, `.gitignore:5-8`
|
||||
@@ -0,0 +1,56 @@
|
||||
# Data layer and schema (deep detail)
|
||||
|
||||
This is the lookup-depth companion to the SKILL.md "data layer" section: read-helper assembly, transaction-helper conventions, the idempotency idioms in full, the `Snapshot` enum, the i64/timestamp modeling, the strict-`<` historical reads, and the schema/migration rules. All of it lives in `query.rs`, `command.rs`, `models.rs`, `db.rs`, and `migrations/0001_init.sql`.
|
||||
|
||||
## Read assembly
|
||||
|
||||
Reads use `query_as::<_, T>` with `T` deriving `sqlx::FromRow`, returning typed structs (`Tenant`, `Relay`, `Activity`, `Invoice`, `InvoiceItem`, `Bolt11`). The `SELECT *` body is built by per-table `select_tenant`/`select_relay`/`select_activity` string helpers that append a trailing `WHERE`/`ORDER` clause; one-off reads inline the SQL but still go through `query_as`. Tenant-scoped reads take a `tenant_pubkey` param and filter on `tenant_pubkey`, but the `_for_tenant` suffix is **not** a reliable marker of tenant scoping: only two reads carry it (`list_relays_for_tenant`, `list_invoices_for_tenant`), while `list_open_invoices`, `list_unbilled_invoice_items`, and `list_billable_activity` are tenant-scoped without the suffix. (Note also that `list_plans`/`get_plan` are synchronous, not async.) Source: `query.rs:6-16,58-262`.
|
||||
|
||||
## Transaction conventions
|
||||
|
||||
`with_tx` is the only primitive: it begins a tx on the pool, runs the async closure with a `&mut Transaction`, commits on `Ok`, and relies on `Transaction`'s `Drop` to roll back on `Err` — there is no explicit `rollback()` call, so a closure that swallows an error and returns `Ok` will commit a partial write. Source: `db.rs:60-68`.
|
||||
|
||||
The `*_tx` helpers are private, suffixed `_tx`, take `&mut Transaction` as their first param, and run via `.execute(&mut **tx)`. Public commands compose them inside one `with_tx` closure and never take a transaction. A `*_tx` that records a state change *returns* the constructed `Activity`, and the public command publishes it after commit. Some single-statement writes run directly on `pool()` with no transaction at all: `create_tenant`, `update_tenant`, the `set_tenant_*` setters, `mark_invoice_notified`, and `insert_bolt11`. Source: `command.rs:14-82,146-183,466-704`, `db.rs:60-68`.
|
||||
|
||||
## The idempotency idioms in full
|
||||
|
||||
- `mark_activity_billed_tx` runs `UPDATE ... WHERE id = ? AND billed_at IS NULL` and returns `rows_affected() > 0` — a bool you must honor, because ignoring a `false` and inserting the invoice item anyway hits the `UNIQUE(activity_id)` index and fails.
|
||||
- `insert_invoice_item_for_activity` claims the activity first and only inserts the line item when the claim won, so a concurrent reconcile never double-bills.
|
||||
- Conditional monotonic `UPDATE`s are guarded on null markers: `mark_invoice_paid_tx`, `mark_bolt11_settled_tx`, and `void_open_invoices_tx`.
|
||||
- `insert_intent_tx` records the Stripe `PaymentIntent` with `INSERT ... ON CONFLICT(id) DO NOTHING`, making settlement idempotent on retried webhooks.
|
||||
- Renewal re-reads `renewed_at` inside the tx and advances it, so it is idempotent per period.
|
||||
- `UNIQUE(invoice_item.activity_id)` is the database backstop behind the claim.
|
||||
- `insert_activity_tx` requires the relay row to already exist in the same tx — it fetches the relay's `tenant_pubkey` — so insert/update the relay before logging.
|
||||
|
||||
Source: `command.rs:279-335,563-704`, `migrations/0001_init.sql:111-112`.
|
||||
|
||||
## The Snapshot type
|
||||
|
||||
`Snapshot` is a serde-tagged enum keyed on `resource_type`, with one variant per resource that logs activity, wrapped in `sqlx::types::Json` on insert. Each `Activity` carries a JSON snapshot of the resource's plan+status. Add a variant (and its `resource_type()`) when a new resource type starts logging activity. Source: `models.rs:8-24`, `command.rs:174-178`.
|
||||
|
||||
## Timestamp-vs-enum modeling and the i64 convention
|
||||
|
||||
Lifecycle is nullable timestamp markers, not status enums: an invoice is open while `paid_at` and `voided_at` are both null, a tenant is churned once `churned_at` is set, a bolt11 is settled once `settled_at` is set; `invoice.method` records provenance only when paid. Filter on the timestamps. Relay status is the exception — a free-form `TEXT` column with **no** `CHECK` constraint and no Rust enum, guarded only by the `RELAY_STATUS_*` consts (by contrast `invoice.method` *does* have a `CHECK`), so a typo'd status string would persist silently. Boolean-ish columns are `i64` 0/1, not Rust `bool` — `list_relays_pending_sync` uses `synced = 0 OR TRIM(sync_error) != ''`. Source: `models.rs:54-94,120-142`, `query.rs:81-121,206-240`, `migrations/0001_init.sql:32,48-60`.
|
||||
|
||||
## Strict-`<` historical lookups
|
||||
|
||||
`get_relay_plan_before` / `get_latest_relay_activity_before` use `created_at < before` (strict `<`, not `<=`) when reconstructing historical relay state from the activity log. A relay created exactly at a period boundary is intentionally not counted active in the prior period (its own creation/change charge covers that period); using `<=` would double-charge the creation period. `list_billable_activity` does **not** use a timestamp boundary at all — it has no `before` param and selects a tenant's unbilled activity via the `billed_at IS NULL` marker (plus an `activity_type` filter), reconciling off a precise marker rather than a timestamp watermark. Source: `query.rs:108-121,206-218,225-240`.
|
||||
|
||||
## Schema and migration rules
|
||||
|
||||
The whole schema lives in a single migration, `0001_init.sql`. Pre-release, schema changes are **squashed** into that file rather than appended as new files; migrations become append-only only after release. `db::init` runs create-if-missing plus WAL plus `./migrations`. Relative `sqlite://` `DATABASE_URL` paths are rewritten against `CARGO_MANIFEST_DIR` (the compile-time backend crate dir), not the process cwd. FK columns are named `{model}_{pk}`. Plans are hardcoded in-memory and synchronous — `list_plans`/`get_plan` are not a DB table, so a new plan is a code edit, not a migration. Source: `db.rs:70-108`, `migrations/0001_init.sql:1-115`, `query.rs:20-54`, root `AGENTS.md:20-22`.
|
||||
|
||||
## A few "Ok with no write" cases
|
||||
|
||||
`create_invoice` and `insert_invoice_items_for_renewal` can legitimately return `Ok` with no write: a non-positive outstanding balance returns `Ok(None)` (credit carries forward) and empty renewal items returns early `Ok(())`. Don't treat a missing invoice as an error. `insert_bolt11` uses `INSERT ... RETURNING *` with `fetch_optional` and returns `Option<Bolt11>`, so handle the `Option` rather than unwrapping. `set_relay_status_tx` (and `update_relay`) always reset `synced = 0` as a side effect of any status/field change, re-queuing the relay for the infra reactor. Source: `command.rs:185-223,300-335,443-464`.
|
||||
|
||||
## Sources
|
||||
|
||||
- read assembly + `_for_tenant` suffix — `backend/src/query.rs:6-16,58-262`
|
||||
- transaction conventions — `backend/src/command.rs:14-82,146-183,466-704`, `backend/src/db.rs:60-68`
|
||||
- idempotency idioms — `backend/src/command.rs:279-335,563-704`, `backend/migrations/0001_init.sql:111-112`
|
||||
- `Snapshot` — `backend/src/models.rs:8-24`, `backend/src/command.rs:174-178`
|
||||
- timestamp/i64/status modeling — `backend/src/models.rs:54-94,120-142`, `backend/src/query.rs:81-121`, `backend/migrations/0001_init.sql:32,48-60`
|
||||
- strict-`<` historical reads (and the marker-based `list_billable_activity`) — `backend/src/query.rs:108-121,206-218,225-240`
|
||||
- schema/migration rules — `backend/src/db.rs:70-108`, `backend/migrations/0001_init.sql:1-115`, `backend/src/query.rs:20-54`, root `AGENTS.md:20-22`
|
||||
- Ok-with-no-write + side effects — `backend/src/command.rs:185-223,300-335,443-464`
|
||||
@@ -0,0 +1,45 @@
|
||||
# External integrations (deep detail)
|
||||
|
||||
This is the lookup-depth companion to the SKILL.md "external integrations" section: per-leaf behavior, the Stripe idempotency/error/currency details, the NWC per-call pattern, Robot's caches and side effects, the at-rest encryption, and the payment cascade. All of it lives in `stripe.rs`, `wallet.rs`, `bitcoin.rs`, `robot.rs`, `env.rs`, `infra.rs`, and `billing.rs`.
|
||||
|
||||
## Stripe leaf
|
||||
|
||||
A thin `reqwest` wrapper, no SDK. `get()`/`post()` build a `RequestBuilder` against the Stripe API and attach `.bearer_auth(&stripe_secret_key)` on every call. The `StripeRequest` trait provides `send_ok()` (runs `error_for_status`) and `send_json()`; all methods end in `.send_json()`. `error_for_status` parses Stripe's JSON error envelope into `message [type-or-code] (param: ...)`, falling back to the raw body.
|
||||
|
||||
The `Idempotency-Key` is `HMAC-SHA256(stripe_secret_key, parts joined by ':')` with a stable per-operation prefix: `create_customer` keys on `[create_customer, tenant_pubkey]`; the charge (`create_payment_intent`) keys on `[payment_intent, invoice_id, payment_method_id]` — `payment_method_id` is in the key on purpose, so a fall-back to a different card for the same invoice produces a distinct key instead of colliding with (and replaying) the original charge. Reuse `idempotency_key()` with a descriptive prefix for any new mutating call; `get_saved_payment_method` and `create_portal_session` send no idempotency key.
|
||||
|
||||
`create_payment_intent` posts `off_session=true`, `confirm=true`, and **requires** `status == "succeeded"`, so the billing cascade falls through via **two distinct paths**: (1) an off-session 3DS/authentication demand is returned by Stripe as an HTTP 402 error, caught earlier by `error_for_status` (do **not** assume `requires_action`/3DS "comes back 2xx" — for off-session confirmed intents Stripe surfaces it as an HTTP error, as the file's own doc comment states); (2) a 2xx response whose status is merely not `"succeeded"` is converted to `Err` by the explicit status check. Don't relax this expecting Stripe to retry off-session. `get_saved_payment_method` returns the **first** card listed (no Stripe-default notion). `create_portal_session` is called directly from the route handler, not via billing. Source: `stripe.rs:1-225` (3DS-as-HTTP-error doc at `104-106`; status check at `135-143`; `error_for_status` at `194-227`), `routes/tenants.rs:263-280`.
|
||||
|
||||
## Wallet (NWC) leaf
|
||||
|
||||
A parsed `NostrWalletConnectURI` with a short-lived **new-then-shutdown-per-call** pattern: every method does `NWC::new(...)`, awaits the op, then `nwc.shutdown()`. Nothing is pooled across awaits. `is_settled` treats an invoice as settled if `state == Settled` *or* `settled_at.is_some()`; `make_invoice` takes msats + description + expiry and returns a bolt11; `pay_invoice` takes a bolt11. Source: `wallet.rs:7-61`.
|
||||
|
||||
## Bitcoin (Coinbase) leaf
|
||||
|
||||
Converts fiat-minor to msats: it fetches the Coinbase spot price for `BTC-<CURRENCY>`, divides minor units by `10^exponent`, then `/ price * 1e11`, rounded to `u64`. `currency_minor_exponent` encodes Stripe's currency table (0 decimals for the zero-decimal currencies, 3 for BHD/JOD/KWD/OMR/TND, 2 for any other 3-letter alpha code) and must stay aligned with what Stripe expects. Trap: the Coinbase client is built with **no timeout** (unlike the 5s zooid and robot clients), so a hung response can stall `fiat_to_msats` indefinitely. The Stripe charge currency is hardcoded `"usd"` in billing despite both modules supporting arbitrary currencies, so a non-USD invoice would be charged as a USD-minor amount. Source: `bitcoin.rs:5-50`, `billing.rs:406-416`.
|
||||
|
||||
## Robot leaf
|
||||
|
||||
The service's Nostr identity, built on `env::get().keys`. `Robot::new()` has a **network side effect** — it `publish_identity().await?` (kind 0 metadata, kind 10002 outbox, kind 10050 messaging relays) before returning, so it is not a pure constructor and it propagates relay send errors. `send_dm` discovers the recipient's relays (kind 10002, then kind 10050) and then NIP-17 `send_private_msg`s; empty relay lists are an error. `fetch_nostr_name` swallows errors via `.ok()` and returns `Option`, so a relay outage looks identical to "no name set" (callers fall back to the first 8 chars of the pubkey). Relay-list caches are positive-only with a 5-minute TTL and cache even an empty `Vec`, so a recipient who just published 10002/10050 keeps failing `send_dm` for up to 5 minutes. Source: `robot.rs:11-211`.
|
||||
|
||||
## At-rest encryption
|
||||
|
||||
`env.encrypt`/`decrypt` are NIP-44 v2 **self-encryption** — the robot's own secret and public key, sender equals recipient — so they provide at-rest confidentiality for the service, not a DM the tenant can read. A tenant's `nwc_url` is encrypted at the route on write and decrypted only at point of use in billing. Outbound zooid auth is NIP-98 via `env.make_auth`, the sole zooid auth mechanism. Source: `env.rs:86-107`, `routes/tenants.rs:130-137`, `billing.rs:381`.
|
||||
|
||||
## The payment cascade
|
||||
|
||||
`attempt_payment` (the cascade orchestrator, defined in `billing.rs`) runs the cascade in order: (1) NWC auto-pay (decrypt `nwc_url` → per-call `Wallet`), (2) out-of-band lightning `is_settled` via the robot wallet, (3) Stripe card on file, (4) manual DM link. A failing NWC or Stripe attempt records its error on the tenant but never aborts the cascade, and the first success returns `Ok`. Lightning pricing flows through `bitcoin::fiat_to_msats(amount, "usd")` in `ensure_bolt11_for_invoice`, which mints a 3600s bolt11. Note `billing.rs` is *not* the only place integrations are used: route handlers call Stripe and the robot directly too (e.g. `create_tenant` calls `api.robot.fetch_nostr_name` and `api.stripe.create_customer`; `create_stripe_session` calls `api.stripe.create_portal_session`). And a handler may invoke **more than one** billing method — `reconcile_tenant` calls both `sync_stripe_customer` and `reconcile_subscription`, `reconcile_invoice` calls both `ensure_bolt11_for_invoice` and `attempt_payment` — and those public billing methods are themselves orchestrators that fan out internally, not leaf methods. Source: `billing.rs:29-33,326-377,400-502`, `routes/tenants.rs:84-94,182-190`, `routes/invoices.rs:54-62`.
|
||||
|
||||
## Per-integration error-string convention
|
||||
|
||||
Error-string quality varies, so prefer actionable strings when adding calls. Stripe builds `message [code] (param)` (actionable); zooid builds `method path returned status: body` (actionable). But the leaves are not uniform: NWC's `make_invoice`/`is_settled` add `anyhow!` context (`failed to create invoice: {e}` / `failed to lookup invoice: {e}`), while `pay_invoice` passes the raw error through unchanged (`anyhow!("{e}")`); and Coinbase only wraps the price-*parse* failure (`invalid BTC spot quote for {currency}: {e}`) — a non-2xx Coinbase API response is surfaced as a bare `reqwest` `error_for_status()` error with no added context. (NWC lives in `wallet.rs`, not the files cited below.) Source: `stripe.rs:191-225`, `infra.rs:289-293`, `bitcoin.rs:24-33`, `wallet.rs:39,47,59`.
|
||||
|
||||
## Sources
|
||||
|
||||
- Stripe leaf — `backend/src/stripe.rs:1-225`, `backend/src/routes/tenants.rs:263-280`
|
||||
- Wallet (NWC) leaf — `backend/src/wallet.rs:7-61`
|
||||
- bitcoin leaf — `backend/src/bitcoin.rs:5-50`, `backend/src/billing.rs:406-416`
|
||||
- Robot leaf — `backend/src/robot.rs:11-211`
|
||||
- at-rest encryption — `backend/src/env.rs:86-107`, `backend/src/routes/tenants.rs:130-137`, `backend/src/billing.rs:381`
|
||||
- payment cascade — `backend/src/billing.rs:29-33,326-377,400-502`
|
||||
- error-string convention — `backend/src/stripe.rs:191-225`, `backend/src/infra.rs:289-293`, `backend/src/bitcoin.rs:24-33`, `backend/src/wallet.rs:39,47,59`
|
||||
@@ -0,0 +1,66 @@
|
||||
# Module map and layering (deep detail)
|
||||
|
||||
This is the lookup-depth companion to the SKILL.md "module map" section: the full per-module responsibility map, the exact `main()` bootstrap and spawn order, the layering direction with concrete traces, and the test-target note. It exists so the SKILL.md "where things live" section can stay prose-only.
|
||||
|
||||
## The flat module map
|
||||
|
||||
Everything lives flat under `backend/src`, one job per module. `backend` is a dual library+binary crate, so this same set of modules is declared in two roots: `lib.rs` declares them as `pub mod` (the library root, the public/canonical declaration) and `main.rs` re-declares them as private `mod` for the binary entry point:
|
||||
|
||||
- **`api`** — the router, the `Api` service container, the authorization helpers (`require_*`, `get_*_or_404`, `is_admin`), and the `AuthedPubkey` NIP-98 extractor.
|
||||
- **`billing`** — the orchestrator: it composes the integration leaves against the DB to reconcile activity into invoice items, renew subscriptions, and collect payment.
|
||||
- **`bitcoin`** — the Coinbase fiat↔msats conversion leaf.
|
||||
- **`command`** — all writes, as free async fns: single-statement writes run directly on `db::pool()`, while multi-step writes run inside `db::with_tx`.
|
||||
- **`db`** — the global `SqlitePool`, the activity broadcast channel, and `with_tx`.
|
||||
- **`env`** — the config singleton.
|
||||
- **`infra`** — the relay-sync reactor plus the zooid client.
|
||||
- **`models`** — the domain and sqlite-row structs plus the relay status string constants.
|
||||
- **`query`** — all reads, plus the hardcoded plans.
|
||||
- **`robot`** — the service's Nostr identity and DM sender.
|
||||
- **`routes/*`** — the HTTP handlers, grouped by resource (`identity`, `plans`, `tenants`, `relays`, `invoices`).
|
||||
- **`stripe`** — the Stripe HTTP leaf.
|
||||
- **`wallet`** — the NWC (Nostr Wallet Connect) leaf.
|
||||
- **`web`** — the response envelope and its success/error builders.
|
||||
|
||||
Source: `lib.rs:1-14`, `main.rs:1-14`, and each module head.
|
||||
|
||||
## The `main()` bootstrap order
|
||||
|
||||
The order is strict, and each service is built from the ones before it:
|
||||
|
||||
1. `dotenvy::dotenv().ok()`
|
||||
2. tracing setup
|
||||
3. `env::init()` — loads and validates all config, panicking on any missing var
|
||||
4. `db::init().await` — normalizes the sqlite URL, opens the pool, sets WAL, runs migrations, and installs the broadcast channel
|
||||
5. `Robot::new().await` — builds the Nostr identity (and publishes it; see the integrations reference)
|
||||
6. `Stripe::new()`
|
||||
7. `Billing::new(robot.clone())`
|
||||
8. `Api::new(billing, stripe, robot)`
|
||||
|
||||
Then it builds the router (`api.router().layer(cors)`), spawns `infra::start()` and `billing.start()` as detached tokio tasks, and only then binds `127.0.0.1:{SERVER_PORT}` and calls `axum::serve`. The HTTP server and the two workers run concurrently for the life of the process.
|
||||
|
||||
Source: `main.rs:27-67`.
|
||||
|
||||
## The layering direction
|
||||
|
||||
The call direction is: route handler → (when needed) `Api` authorization helpers (`require_admin` / `require_admin_or_tenant` / `require_tenant`) → `query` (reads) / `command` (writes) / `billing` (orchestration) → `db`. There is no single strict linear layer: a handler may skip the authz helpers entirely, and it may call integration leaves directly rather than going through `billing`. The integration leaves (`stripe`/`wallet`/`bitcoin`/`robot`) are composed in **two** places — `billing.rs` holds its own `stripe`/`wallet`/`robot` for the reconciliation loop, while `Api` holds `stripe` and `robot` that route handlers invoke directly — so the leaves are *not* composed exclusively by `billing.rs`.
|
||||
|
||||
Two concrete traces:
|
||||
|
||||
- **`create_tenant`** calls no `require_*` helper (only the `AuthedPubkey` extractor); it calls `query::get_tenant`, then `api.robot.fetch_nostr_name` and `api.stripe.create_customer` directly (two integration leaves, not via `billing`), then `command::create_tenant` (`routes/tenants.rs:76-109`).
|
||||
- **`reconcile_invoice`** calls `query::get_invoice`, then `billing.ensure_bolt11_for_invoice` and `billing.attempt_payment` — here the handler delegates the multi-integration orchestration to `billing` rather than calling the leaves itself (`routes/invoices.rs:35-60`).
|
||||
|
||||
## Router assembly
|
||||
|
||||
`Api::router()` is the single place every route string is wired to a handler fn imported from `routes/*`, after which `Arc<Api>` is attached via `.with_state`. An endpoint therefore needs *both* a handler fn (in the matching `routes/*.rs`) and a `.route(...)` line (in `api.rs`) — two different files, both required (`api.rs:66-99`).
|
||||
|
||||
## The test-target note
|
||||
|
||||
`lib.rs` re-exports every module `pub` solely for an integration-test target that does not exist yet — there is no `backend/tests` dir — so nothing currently consumes those re-exports. If you add an integration-test crate, this is the surface it reads (`lib.rs:1-14`).
|
||||
|
||||
## Sources
|
||||
|
||||
- flat module list — `backend/src/lib.rs:1-14`, `backend/src/main.rs:1-14`, and each module head
|
||||
- `main()` bootstrap and spawn order — `backend/src/main.rs:27-67`
|
||||
- layering traces — `backend/src/routes/tenants.rs:76-109`, `backend/src/routes/invoices.rs:35-60`
|
||||
- router assembly — `backend/src/api.rs:66-99`
|
||||
- test-target note — `backend/src/lib.rs:1-14`
|
||||
@@ -0,0 +1,47 @@
|
||||
# Reactors and relay sync (deep detail)
|
||||
|
||||
This is the lookup-depth companion to the SKILL.md "background reactors" section: the relay-sync retry/backoff machinery, the self-feeding failure loop, the zooid POST-vs-PATCH request, and the billing dunning cascade timing. All of it lives in `infra.rs`, `billing.rs`, `db.rs`, and `query.rs`.
|
||||
|
||||
## The infra recv loop
|
||||
|
||||
`infra::start()` calls `db::subscribe()` once, runs `reconcile_relay_state("startup")` to recover relays left unsynced from a prior run, then loops on `rx.recv().await`, handling all three broadcast outcomes:
|
||||
|
||||
- **`Ok(activity)`** → `handle_activity`, which filters to `resource_type == "relay"` *and* an `activity_type` in `{create_relay, update_relay, activate_relay, deactivate_relay, fail_relay_sync}`; everything else is ignored. A `fail_relay_sync` routes to `schedule_relay_sync_retry`; the others load the relay via `query::get_relay` and call `sync_relay`.
|
||||
- **`Lagged(n)`** → `warn` plus a full `reconcile_relay_state("lagged")` sweep to recover the dropped messages.
|
||||
- **`Closed`** → break out of the loop, terminating the worker. Because the broadcast `Sender` lives in a `static OnceLock` for the whole process, `Closed` effectively never happens in normal operation — but if it did, `infra::start` returns and is not restarted by `main`, leaving relay provisioning dead until the process restarts.
|
||||
|
||||
`reconcile_relay_state` queries `list_relays_pending_sync` (`synced = 0 OR TRIM(sync_error) != ''`), returns early if empty, and otherwise routes blank-error relays to an immediate `sync_relay` and error-carrying ones through backoff. Source: `infra.rs:28-92`, `query.rs:81-83`.
|
||||
|
||||
## Backoff
|
||||
|
||||
`schedule_relay_sync_retry` counts `consecutive_failures` via `take_while` over `fail_relay_sync` activities at the **head** of the resource history (ordered `created_at DESC`) — any non-failure activity at the head resets the count to 0, which is what lets a recovered relay restart backoff from the base delay. The delay is `BASE(30s) << (attempt - 1)`, capped at `MAX(15min)`; after `MAX_ATTEMPTS(6)` it returns `None`, logs "retries exhausted; awaiting manual intervention", and stops. The retry itself is a fire-and-forget `tokio::spawn` that sleeps the computed delay, re-fetches the relay, and calls `sync_relay` (a missing relay is a silent no-op). Source: `infra.rs:15-17,94-148`, `query.rs:242-249`.
|
||||
|
||||
## The self-feeding loop
|
||||
|
||||
`sync_relay` never returns an error: on `Ok` it calls `command::complete_relay_sync` (sets `synced = 1`, `sync_error = ''`); on `Err` it calls `command::fail_relay_sync` (sets `synced = 0`, `sync_error = ...`), which publishes a `fail_relay_sync` activity after commit, which re-enters `handle_activity` and re-schedules backoff. The retry chain terminates when **any** of these happen: the sync succeeds (`complete_relay_sync` resets `synced = 1` and breaks the consecutive-failure streak counted by `take_while`, so no further retry is scheduled), the relay no longer exists (`get_relay` returns `None`, a silent no-op), a `get_relay` query errors (logged and stopped), or the consecutive-failure count exceeds `MAX_ATTEMPTS(6)` — after which the relay sits with `synced = 0` and a set `sync_error` until manual intervention or another activity touches it. Note `set_relay_status_tx` and `update_relay` always reset `synced = 0` as a side effect, so a "pure" status flip is never sync-neutral. Source: `infra.rs:57-60,136-146,151-166`, `command.rs:185-273,580-596`.
|
||||
|
||||
## The zooid request
|
||||
|
||||
`try_sync_relay` assembles the request body inline as a `serde_json::json!`: `host` (subdomain + `relay_domain`), `schema` (`relay.id`), an `inactive` flag, and the `info`/`policy`/`groups`/`management`/`push`/`roles` blocks, plus a conditional blossom S3 block and a livekit block — each gated on the relay's `*_enabled` i64 flag and falling back to `{enabled: false}`.
|
||||
|
||||
`is_new` is true **only** when `synced != 1` *and* there is no prior `complete_relay_sync` activity. `is_new` alone decides `POST` (with a freshly generated `Keys::generate` secret inserted into the body) vs `PATCH` (secret omitted). Because `update_relay` resets `synced = 0`, a re-sync after an update would look "new" by the `synced` flag alone — the second condition (no prior `complete_relay_sync`) is what makes it a `PATCH`, so the relay is not re-created and its secret is not clobbered. Caravel never persists the secret, so this check is load-bearing.
|
||||
|
||||
All zooid calls go through `request(method, path, body)`: a 5-second `reqwest` client, base from `zooid_api_url` (trailing slash trimmed), NIP-98 `Authorization` via `env.make_auth`, and a non-2xx response is turned into an `anyhow::bail!` carrying the status and body. Source: `infra.rs:168-295`.
|
||||
|
||||
## Billing worker timing
|
||||
|
||||
`POLL_INTERVAL` is 1 hour, so dunning runs at hour granularity. The DM guards exist specifically so the hourly tick doesn't re-DM on every pass:
|
||||
|
||||
- `GRACE_PERIOD_SECS` = 7 days (dunning grace before churn)
|
||||
- `FRESH_INVOICE_DM_GRACE_SECS` = 24h (hold the manual-payment DM until an open invoice is at least this old, because a fresh invoice is surfaced in-app first)
|
||||
- `MANUAL_PAYMENT_DM_INTERVAL_SECS` = 12 days (minimum spacing between reminder DMs)
|
||||
|
||||
`attempt_payment_using_dm` checks both `invoice.created_at` and `invoice.notified_at` before sending. `reconcile_subscription` clones the tenant and mutates the local copy (billing anchor, churn, payment method), updating the DB via explicit `command` calls, so the synchronous reconcile route re-reads the tenant afterward to reflect the changes. Source: `billing.rs:15-23,46-130,436-449`.
|
||||
|
||||
## Sources
|
||||
|
||||
- infra recv loop + reconcile — `backend/src/infra.rs:28-92`, `backend/src/query.rs:81-83`
|
||||
- backoff — `backend/src/infra.rs:15-17,94-148`, `backend/src/query.rs:242-249`
|
||||
- self-feeding loop — `backend/src/infra.rs:57-60,136-146,151-166`, `backend/src/command.rs:185-273,580-596`
|
||||
- zooid request + POST-vs-PATCH — `backend/src/infra.rs:168-295`
|
||||
- billing worker timing — `backend/src/billing.rs:15-23,46-130,436-449`
|
||||
@@ -0,0 +1,60 @@
|
||||
# Request lifecycle and the web envelope (deep detail)
|
||||
|
||||
This is the lookup-depth companion to the SKILL.md "request lifecycle" section: the exact NIP-98 decode, the success/error envelope field shapes, every builder's fixed-vs-supplied code, the full in-use domain-error-code list, and the handful of authorization quirks. All of it lives in `api.rs`, `web.rs`, and the `routes/*` handlers.
|
||||
|
||||
## The NIP-98 decode, step by step
|
||||
|
||||
`decode_nip98_pubkey` does the following, and every failure collapses to a single 401 `unauthorized` via `extract_auth_pubkey`'s `.map_err(unauthorized)`:
|
||||
|
||||
1. require an `Authorization: Nostr <base64>` header
|
||||
2. base64-decode it to a JSON Nostr event
|
||||
3. assert `event.kind == HttpAuth` (kind 27235)
|
||||
4. call `event.verify()` (the signature/id check)
|
||||
5. take the **last** `u` tag (`.last()`, not `.first()`)
|
||||
6. assert it equals `env::get().server_url`
|
||||
7. return `event.pubkey.to_hex()`
|
||||
|
||||
Because all of these collapse to one 401, you cannot distinguish a missing header from a bad signature from a host mismatch at the response level. Source: `api.rs:163-203`.
|
||||
|
||||
## The deliberate non-strictness
|
||||
|
||||
The check binds signer identity and host affinity only. It does **not** verify the HTTP method, the exact request URL/path/query, a payload hash, timestamp freshness, or maintain any replay cache. Per the README rationale, the frontend signs one kind-27235 event with `u = VITE_API_URL` and caches the header ~10 minutes; the tradeoff is a reusable ~10-minute bearer header (fewer wallet-signing prompts, no cookie sessions) at the cost of weaker request-intent binding than strict NIP-98. Do not "fix" it to per-request binding — it is a design choice. Source: `api.rs:167-203`, `README.md:128-137`.
|
||||
|
||||
## The envelope structs
|
||||
|
||||
- **Success:** `DataResponse { data: T, code: "ok" }`, serialized as `{ "data": ..., "code": "ok" }`. The field is `data`, and `code` is a `&'static str`.
|
||||
- **Error:** `ErrorResponse { error: String, code: String }`, serialized as `{ "error": ..., "code": ... }`. The field is `error`, and `code` is an owned `String`.
|
||||
|
||||
Note the top-level keys differ — `data` on success, `error` on failure — so a client must branch on success-vs-error rather than reading one fixed key. The only HTTP statuses the success builders emit are 200 (`ok`) and 201 (`created`); there is no 204/no-content builder in this file. Source: `web.rs:33-43`.
|
||||
|
||||
Success builders return `ApiResult` (= `Result<Response, ApiError>`) already wrapped in `Ok`, so they sit at the tail of a handler with no `Ok(..)`: `res(status, data)`, `ok(data)` (= `res(OK, ..)`), `created(data)` (= `res(CREATED, ..)`). `ApiError` is a boxed `Response` with the status baked in at construction — it carries no separate status field. Source: `web.rs:17-57`.
|
||||
|
||||
## Error builders and their codes
|
||||
|
||||
The named error builders fix both status and code: `unauthorized` → 401/`unauthorized`, `forbidden` → 403/`forbidden`, `not_found` → 404/`not-found`, `internal` → 500/`internal`. The two domain builders take a **caller-supplied** kebab-case code: `bad_request(code, msg)` → 400, `unprocessable(code, msg)` → 422. Passing the wrong status builder silently emits the wrong HTTP status with your intended code. Source: `web.rs:61-103`.
|
||||
|
||||
`map_unique_error` downcasts `sqlx::Error::Database` and matches the raw message *substring*: contains `pubkey` → `pubkey-exists`, contains `subdomain` → `subdomain-exists`, else `None`. Because it matches on the message text rather than the constraint name, a column rename can silently regress the 422 to a 500. `map_relay_write_error` is a **private** helper inside `routes/relays.rs` (not exported from `web.rs`) that wraps `map_unique_error`: a `subdomain-exists` hit becomes a 422, anything else a 500. Source: `web.rs:115-129`, `routes/relays.rs:309-316`.
|
||||
|
||||
## The full in-use domain-error-code list
|
||||
|
||||
Beyond the framework codes (`ok`/`unauthorized`/`forbidden`/`not-found`/`internal`), the domain codes actually surfaced to clients are: `subdomain-exists`, `invalid-subdomain`, `invalid-plan`, `premium-feature`, `member-limit-exceeded` (all 422), and `relay-is-active` / `relay-is-inactive` / `relay-is-delinquent` (all 400, via `bad_request`). `pubkey-exists` is *defined* in `map_unique_error` alongside `subdomain-exists` but is **not** surfaced as an error: its only call site (`routes/tenants.rs:105`) intercepts the unique-constraint violation and returns 200 OK with the existing tenant (idempotent re-fetch). Source: status helpers `bad_request` (400) / `unprocessable` (422) at `web.rs:89-95`, code strings at `web.rs:122-127` and `routes/relays.rs:204,225,230,249,253,283,287,293,311-312`, `pubkey-exists` interception at `routes/tenants.rs:103-110`.
|
||||
|
||||
## Load-vs-authorize ordering
|
||||
|
||||
For a path-by-id resource owned by a tenant, fetch the resource **first** (via `get_*_or_404`), then authorize against its `tenant_pubkey` — you need the loaded row to know whose it is. This also intentionally leaks existence: a non-owner of an *existing* relay/invoice gets a 403, and a 404 only for a truly missing id. For tenant routes keyed by the tenant's own pubkey, the `Path` *is* the `tenant_pubkey`, so authorize on it first, then fetch. Don't reorder these. Source: `routes/relays.rs:29-37`, `routes/invoices.rs:19-32`, `routes/tenants.rs:61-71`.
|
||||
|
||||
## Authorization quirks
|
||||
|
||||
- **`create_tenant`** authorizes nobody beyond authentication: it uses the `AuthedPubkey` *as* the tenant identity, so a caller can only ever create or return their own tenant. It is idempotent and even swallows a `pubkey-exists` race by re-reading; a missing row after that race is a 500 with "tenant row missing after unique-constraint race". Source: `routes/tenants.rs:76-114`.
|
||||
- **`GET /tenants/:pubkey/stripe/session`** is the only same-tenant-*only* route (it uses `require_tenant`, not `require_admin_or_tenant`). Source: `routes/tenants.rs:263-269`.
|
||||
- **`get_plan`** is synchronous: `query::get_plan` returns a `Result` (no `.await`, no `Option`), so a missing plan is mapped to `not_found` via an `Err`, not a `None`. Don't pattern-match plans for a `None` case. Source: `routes/plans.rs:13-15`.
|
||||
- **`update_relay`** enforces `member-limit-exceeded` (422) only when the plan actually changes and the new plan has a `members` limit; it fetches live member counts from zooid, which returns empty for unsynced relays, so an unsynced relay appears to have 0 members for the limit check. Source: `routes/relays.rs:190-207,263-269`.
|
||||
|
||||
## Sources
|
||||
|
||||
- NIP-98 decode + non-strictness — `backend/src/api.rs:163-203`, `backend/README.md:128-137`
|
||||
- envelope structs + success builders — `backend/src/web.rs:17-57`
|
||||
- error builders + `map_unique_error` — `backend/src/web.rs:61-129`, `backend/src/routes/relays.rs:309-316`
|
||||
- domain-error-code list — `backend/src/web.rs:89-95,122-127`, `backend/src/routes/relays.rs:204,224-296`, `backend/src/routes/tenants.rs:103-110`
|
||||
- load-vs-authorize ordering — `backend/src/routes/relays.rs:29-37`, `backend/src/routes/invoices.rs:19-32`, `backend/src/routes/tenants.rs:61-71`
|
||||
- authorization quirks — `backend/src/routes/tenants.rs:76-114,263-269`, `backend/src/routes/plans.rs:13-15`, `backend/src/routes/relays.rs:190-207,263-269`
|
||||
Reference in New Issue
Block a user