5.9 KiB
Reactors and relay sync (deep detail)
This is the lookup-depth companion to the SKILL.md "background reactors" section: the relay-sync retry/backoff machinery, the self-feeding failure loop, the zooid POST-vs-PATCH request, and the billing dunning cascade timing. All of it lives in infra.rs, billing.rs, db.rs, and query.rs.
The infra recv loop
infra::start() calls db::subscribe() once, runs reconcile_relay_state("startup") to recover relays left unsynced from a prior run, then loops on rx.recv().await, handling all three broadcast outcomes:
Ok(activity)→handle_activity, which filters toresource_type == "relay"and anactivity_typein{create_relay, update_relay, activate_relay, deactivate_relay, fail_relay_sync}; everything else is ignored. Afail_relay_syncroutes toschedule_relay_sync_retry; the others load the relay viaquery::get_relayand callsync_relay.Lagged(n)→warnplus a fullreconcile_relay_state("lagged")sweep to recover the dropped messages.Closed→ break out of the loop, terminating the worker. Because the broadcastSenderlives in astatic OnceLockfor the whole process,Closedeffectively never happens in normal operation — but if it did,infra::startreturns and is not restarted bymain, leaving relay provisioning dead until the process restarts.
reconcile_relay_state queries list_relays_pending_sync (synced = 0 OR TRIM(sync_error) != ''), returns early if empty, and otherwise routes blank-error relays to an immediate sync_relay and error-carrying ones through backoff. Source: infra.rs:28-92, query.rs:81-83.
Backoff
schedule_relay_sync_retry counts consecutive_failures via take_while over fail_relay_sync activities at the head of the resource history (ordered created_at DESC) — any non-failure activity at the head resets the count to 0, which is what lets a recovered relay restart backoff from the base delay. The delay is BASE(30s) << (attempt - 1), capped at MAX(15min); after MAX_ATTEMPTS(6) it returns None, logs "retries exhausted; awaiting manual intervention", and stops. The retry itself is a fire-and-forget tokio::spawn that sleeps the computed delay, re-fetches the relay, and calls sync_relay (a missing relay is a silent no-op). Source: infra.rs:15-17,94-148, query.rs:242-249.
The self-feeding loop
sync_relay never returns an error: on Ok it calls command::complete_relay_sync (sets synced = 1, sync_error = ''); on Err it calls command::fail_relay_sync (sets synced = 0, sync_error = ...), which publishes a fail_relay_sync activity after commit, which re-enters handle_activity and re-schedules backoff. The retry chain terminates when any of these happen: the sync succeeds (complete_relay_sync resets synced = 1 and breaks the consecutive-failure streak counted by take_while, so no further retry is scheduled), the relay no longer exists (get_relay returns None, a silent no-op), a get_relay query errors (logged and stopped), or the consecutive-failure count exceeds MAX_ATTEMPTS(6) — after which the relay sits with synced = 0 and a set sync_error until manual intervention or another activity touches it. Note set_relay_status_tx and update_relay always reset synced = 0 as a side effect, so a "pure" status flip is never sync-neutral. Source: infra.rs:57-60,136-146,151-166, command.rs:185-273,580-596.
The zooid request
try_sync_relay assembles the request body inline as a serde_json::json!: host (subdomain + relay_domain), schema (relay.id), an inactive flag, and the info/policy/groups/management/push/roles blocks, plus a conditional blossom S3 block and a livekit block — each gated on the relay's *_enabled i64 flag and falling back to {enabled: false}.
is_new is true only when synced != 1 and there is no prior complete_relay_sync activity. is_new alone decides POST (with a freshly generated Keys::generate secret inserted into the body) vs PATCH (secret omitted). Because update_relay resets synced = 0, a re-sync after an update would look "new" by the synced flag alone — the second condition (no prior complete_relay_sync) is what makes it a PATCH, so the relay is not re-created and its secret is not clobbered. Caravel never persists the secret, so this check is load-bearing.
All zooid calls go through request(method, path, body): a 5-second reqwest client, base from zooid_api_url (trailing slash trimmed), NIP-98 Authorization via env.make_auth, and a non-2xx response is turned into an anyhow::bail! carrying the status and body. Source: infra.rs:168-295.
Billing worker timing
POLL_INTERVAL is 1 hour, so dunning runs at hour granularity. The DM guards exist specifically so the hourly tick doesn't re-DM on every pass:
GRACE_PERIOD_SECS= 7 days (dunning grace before churn)FRESH_INVOICE_DM_GRACE_SECS= 24h (hold the manual-payment DM until an open invoice is at least this old, because a fresh invoice is surfaced in-app first)MANUAL_PAYMENT_DM_INTERVAL_SECS= 12 days (minimum spacing between reminder DMs)
attempt_payment_using_dm checks both invoice.created_at and invoice.notified_at before sending. reconcile_subscription clones the tenant and mutates the local copy (billing anchor, churn, payment method), updating the DB via explicit command calls, so the synchronous reconcile route re-reads the tenant afterward to reflect the changes. Source: billing.rs:15-23,46-130,436-449.
Sources
- infra recv loop + reconcile —
backend/src/infra.rs:28-92,backend/src/query.rs:81-83 - backoff —
backend/src/infra.rs:15-17,94-148,backend/src/query.rs:242-249 - self-feeding loop —
backend/src/infra.rs:57-60,136-146,151-166,backend/src/command.rs:185-273,580-596 - zooid request + POST-vs-PATCH —
backend/src/infra.rs:168-295 - billing worker timing —
backend/src/billing.rs:15-23,46-130,436-449