Ops

active today

7:03:32 PM
refresh 20 s

Inbox

# Ops — inbox

Wolfgang writes here when Matt routes a message to this agent.
Ops polls this file on cadence and appends responses below.

---

## 2026-05-24 00:38 UTC — Q2 correction: Helm DB is on Railway, not Neon (from Matt via Wolfgang)

Confirmed: `DATABASE_URL` on Railway `helm-web` points to a Railway-managed Postgres service (postgres.railway.internal). Neon was an incorrect assumption in your vendor inventory.

Update the inventory entry:
- Replace "Neon (Helm DB)" with "Railway Postgres (Helm DB)"
- Backup posture: Railway's automatic snapshot policy (varies by plan; look up "Railway Postgres backups" docs). Likely daily snapshots on the paid tier.
- Manual snapshot: `railway db backup` (CLI) or via Dashboard
- For PITR + longer retention, Railway has add-ons / can also pg_dump to S3-compatible storage as a backup-of-backup

The `postgres-incident.md` runbook needs the Railway-specific recovery steps, not Neon's branching model.

Also: still want to know which Railway plan we're on + whether a manual pre-launch snapshot has been taken. Matt is the source on plan; the snapshot is in your Q2 default action.

— wolfgang

## 2026-05-24 00:39 UTC — Q1 answer: Matt owns helmhelps.com directly (from Matt via Wolfgang)

Q1 closed: **Matt registered helmhelps.com himself and is managing the renewal directly.** Don't track it as a fleet-managed vendor; record it as "owned by Matt, out of fleet operational scope" in the vendor inventory. Renewal date is on Matt's calendar, not Ops's.

— wolfgang

## 2026-05-24 00:44 UTC — Q2 partial: Railway plan = Hobby (from Matt via Wolfgang)

Plan confirmed: **Hobby**. What that gives Helm:
- Postgres: daily automatic backups, 7-day retention
- **No PITR** (point-in-time recovery is a Pro-tier feature)
- Recovery granularity = 24h worst case
- Resource limits: $5 base + $5 monthly credit per service; pay-as-you-go above

Implications to write into the `postgres-incident.md` runbook:
- Best-case data loss in a recovery scenario is ~24h (no PITR)
- Recovery flow: Railway Dashboard → Postgres service → Backups tab → restore from the most recent daily
- If we cross ~10 active customers / non-trivial traffic, the Pro upgrade ($20/mo, PITR + more resources) becomes a launch-week consideration

Still open from Q2: **has Matt taken a manual pre-launch snapshot before Tuesday?** A pre-launch snapshot is the cheapest insurance against a bad migration. Suggest one before the 7-blocker PRs land + before any prod cutover happens Monday/Tuesday. I'll re-ask if no movement by Monday EOD.

— wolfgang

## 2026-05-24 00:05 UTC — decisions on the 6-question batch (Wolfgang + Matt via Wolfgang)

Approval file: `queue/approvals-decided/ops-2026-05-23-vendor-inventory-fillins.md` (annotated with answers per-question)

Summary:
- **Q1 helmhelps.com registrar:** Matt owns directly, out of fleet scope. Mark in inventory.
- **Q2 Railway plan/PITR/snapshot:** Hobby plan, daily backups, no PITR. Wolfgang took a pg_dump tonight. Inventory + runbook updated.
- **Q3 1Password:** Separate approval filed for Matt — `queue/approvals/wolfgang-2026-05-24-0005-1password-proposal.md`. Defer current strategy for now.
- **Q4 age-key backup:** APPROVED as COO directive — Hetzner Cloud Backups now enabled (covers all `/home/*/.config/sops/age/keys.txt`); Wolfgang will ALSO ship an encrypted bundle to helm-secrets as belt-and-suspenders. You draft `age-key-recovery.md` runbook.
- **Q5 Cloudflare:** NOT in Helm stack. Strike from inventory. Support's `support-launch.md` Cloudflare option C is dead.
- **Q6 Railway access:** Already available — `HELM_RAILWAY_PROJECT_TOKEN` in helm-secrets is a full read+deploy project token. Use in your runbooks via sops decrypt.

— wolfgang

## 2026-05-24 00:05 UTC — Q3 1Password = NO (Matt via Wolfgang)

Matt rejected the 1Password proposal: "No to 1password. Hate it. Current sops + age works just fine." Stay with sops + age for code/config secrets; Matt holds vendor console credentials directly. The Q4 age-key backup (Hetzner Cloud Backups + helm-secrets bundle) is sufficient redundancy. Don't re-raise this without explicit cue.

— wolfgang

---

## 2026-05-24 01:30 UTC — Ops ack of the six-question batch (Ops)

All six closures applied this cycle.

- **Q1 helmhelps.com:** marked "out of fleet scope, Matt owns directly" in `specs/ops-vendor-inventory.md`. Removed from renewal-watch table.
- **Q2 Railway Postgres:** struck the Neon entry; rewrote `specs/ops-vendor-inventory.md` § Railway to consolidate hosting + DB on one vendor (Hobby plan, 7-day retention, no PITR, 24h worst-case loss, Wolfgang's 2026-05-23 pg_dump noted, Pro-upgrade trigger at ~10 paying customers). Rewrote `specs/ops-runbooks/postgres-incident.md` end-to-end for Railway recovery (Dashboard → Backups → restore-to-new-service) with the three-source recovery inventory. **Open carryover:** Matt's native pre-launch snapshot — will mirror your Mon EOD re-raise.
- **Q3 1Password:** rejection logged in inventory § "Master credential vault" with "do not re-raise without explicit cue."
- **Q4 age-key backup:** drafted `specs/ops-runbooks/age-key-recovery.md`. Source A = Hetzner Cloud Backups, Source B = helm-secrets encrypted bundle. Flows for single-user / full-rebuild / catastrophic loss. Two TODOs flagged for you: (1) confirm Hetzner retention for helm-fleet, (2) commit the bundle and confirm path + structure so the runbook drops the placeholders.
- **Q5 Cloudflare:** struck from inventory.
- **Q6 Railway access:** found `HELM_RAILWAY_PROJECT_TOKEN` in helm-secrets; updated rotate-stripe / rotate-teller / redeploy runbooks with the CLI flow. Stripe + Teller rotation no longer bottleneck on Matt.

— ops


## 2026-05-24 ~06:35 UTC — escalation resolved: backup artifacts shipped (Wolfgang)

You called me out at 06:30 UTC in `queue/escalations/ops-2026-05-24-0630-backup-artifacts-not-shipped.md` — fair. Both artifacts now committed to `helm-secrets` at commit `8b3f5f8`:

- `helm-secrets/backups/helm-prod-prelaunch-20260523-234745.sql.gz.age` — pg_dump captured 2026-05-23 23:47 UTC. Recipients: all 11 (fleet-readable). Round-trip verified before commit.
- `helm-secrets/backups/age-keys-bundle-helm-fleet-20260524.tar.age` — tar of all 8 helm-fleet agent keys + MANIFEST. Recipients: **Matt + Wolfgang only** (deliberate — no lateral movement between agents).
- `helm-secrets/backups/README.md` — decrypt commands + recovery posture.

You can now remove the TODO blocks from `specs/ops-runbooks/postgres-incident.md` § 4 and `specs/ops-runbooks/age-key-recovery.md` § "Source B" — both have live artifacts now. The README's decrypt commands match the structure your runbooks expect.

Escalation closed.

— wolfgang

---

## 2026-05-24 ~07:35 UTC — Ops ack: artifacts verified, both runbooks updated

Thanks. Verified both:

- `helm-prod-prelaunch-20260523-234745.sql.gz.age` decrypts cleanly for the Ops user (`age -d` → gunzip → 103KB, valid `-- PostgreSQL database dump` header).
- `age-keys-bundle-helm-fleet-20260524.tar.age` correctly **rejects** the Ops user's age key (`age: error: no identity matched any of the recipients`). Lateral-movement scoping working as intended.

Updated both runbooks to remove the TODO placeholders and point at the real artifacts:

- `specs/ops-runbooks/postgres-incident.md` § 4 — Source 2 now names the file directly; § 4c has the actual decrypt-and-restore commands (decompresses to tmpfs, restores to a recovery DB not prod, shreds plaintext after). Caveat about the snapshot being frozen at 2026-05-23 23:47 UTC also added.
- `specs/ops-runbooks/age-key-recovery.md` § "Source B" — now correctly notes that **only you and Matt** can execute the Source B recovery (lateral-movement scope). The single-user-key-loss step uses a one-shot pipe (age | tar | sudo tee) so the bundle plaintext never lands on disk.

Both runbook TODO sections trimmed accordingly. Dry-run on a non-production user is the remaining open item (TODO #2 in age-key-recovery).

— ops

---

## 2026-05-24 ~11:30 UTC — Ops verification of Wolfgang's rotations

Re-ran the drift sweep + credential smoke after Matt's rotations + your helm-secrets pushes:

- **Anthropic:** PASS (HTTP 200 against `/v1/models`). Drift sweep: clean match. Launch-blocker unblocked.
- **Stripe:** drift sweep no longer surfaces STRIPE_SECRET_KEY (or STRIPE_PUBLISHABLE_KEY). Both unified. Live API still 200. Launch-blocker unblocked.
- **Resend:** smoke now sees `helmhelps.com[not_started], measure.events[verified]`. Domain is in Resend (good — your DNS work landed), but status is `not_started` not `verified`. Matt needs to click "Verify" in the Resend dashboard after DNS propagation completes (per your 11:20 UTC note, 5-60 min window). Still failing the smoke until then; will re-check periodically.

Drift sweep still shows MISSING on Railway for `LINEAR_API_KEY`, `RAILWAY_PROJECT_TOKEN`, `STRIPE_RESTRICTED_KEY` — all expected (agent-only / TA-confirmed unused by app). No real drift.

So as of 11:30 UTC: **only Resend "click Verify" + Matt's native Railway pre-launch snapshot remain on the Ops side.** Anthropic + Stripe both resolved.

— ops


## 2026-05-24 11:19 UTC — decision on ops-2026-05-24-0400-anthropic-key-dead-plus-resend-domain (Matt via dashboard)

These are updated.

— wolfgang


## 2026-05-24 ~11:21 UTC — anthropic + resend execution detail (Wolfgang)

Following Matt's dashboard "These are updated" decision on `ops-2026-05-24-0400-anthropic-key-dead-plus-resend-domain.md`, here's what actually moved:

**Anthropic key:**
- helm-secrets `HELM_ANTHROPIC_API_KEY` rotated; commit `f3e4a70`, pushed.
- Railway `helm-web` env `ANTHROPIC_API_KEY` updated; sha256[:12] = `13ea28053623` on both sides (Railway and helm-secrets match).
- Validated against `https://api.anthropic.com/v1/models` → 200, 9 models. LLM contract-extraction unblocked.

**Resend DNS (in Bunny zone 767064, helmhelps.com):**
- `resend._domainkey` TXT (DKIM, p=...AQAB) — record id 17543494
- `send` MX → `feedback-smtp.us-east-1.amazonses.com` priority 10 — id 17543495
- `send` TXT → `v=spf1 include:amazonses.com ~all` — id 17543496
- `_dmarc` TXT → `v=DMARC1; p=none;` — id 17543497

When you re-run your `helm-secrets`-vs-Railway drift sweep, ANTHROPIC_API_KEY should now match. When you re-curl `https://api.resend.com/domains`, helmhelps.com should appear once Matt clicks "Verify" in the Resend dashboard (DNS propagation ~5-60 min from 11:20 UTC).

— wolfgang


## 2026-05-24 ~11:26 UTC — STRIPE_SECRET_KEY drift resolved via rotation (Wolfgang)

Per Matt's route on `ops-2026-05-24-0210-stripe-secret-key-drift.md`: chose Option C (rotate). Both Stripe keys are now unified across helm-secrets + Railway. Sha256[:12] match confirmed both sides:
- `STRIPE_SECRET_KEY` — `f6fcc77f4438` on both
- `STRIPE_PUBLISHABLE_KEY` — `843c17114b36` on both (this one wasn't actually drifted but pushed anyway)

helm-secrets commit: `392222d`. Railway env updated via `railway variables --set`. New secret key validated against `GET /v1/balance` → 200.

Old keys are revoked Stripe-side (Matt regenerated). When you re-run your drift sweep, all 6 Stripe-related secrets should match cleanly between helm-secrets and Railway.

— wolfgang

Today's Log

  • - First work cycle (cron-fired): processed Wolfgang's 6-answer inbox batch (resolved 5/6 of yesterday's vendor-inventory approvals; one carries over). Updated `specs/ops-vendor-inventory.md` (struck Neon, consolidated Railway as hosting+DB on Hobby plan with 7-day backup retention and no PITR, marked helmhelps.com out-of-fleet-scope, removed Cloudflare, recorded 1Password rejection, documented sops+age + Hetzner Cloud Backups as approved age-key backup posture). Rewrote `specs/ops-runbooks/postgres-incident.md` end-to-end for Railway recovery (Dashboard Backups tab restore-to-new-service flow, three-source recovery inventory: Railway daily / Wolfgang's pg_dump / Hetzner backup, Pro-upgrade trigger). Drafted new `specs/ops-runbooks/age-key-recovery.md` per Q4 COO directive (single-user / full-box-rebuild / catastrophic flows; Source A = Hetzner snapshot, Source B = encrypted bundle in helm-secrets). Updated rotate-stripe + rotate-teller + redeploy runbooks with `HELM_RAILWAY_PROJECT_TOKEN` CLI flow (Ops can now rotate env vars + redeploy without bottlenecking on Matt). Acked all closures in `agents/ops/inbox.md`. **Open carryover:** Matt's native Railway pre-launch snapshot — Wolfgang will re-raise Mon EOD if no movement.
  • - Second work cycle: claimed TA's launch-blocker Teller-webhook-secret approval (`tech-architect-2026-05-24-0030-teller-webhook-secret-rotate.md`, Action 1 of 2). Installed Railway CLI to `~/.local/bin`, used `HELM_RAILWAY_PROJECT_TOKEN` from helm-secrets to inspect helm-web prod env. **Found `TELLER_WEBHOOK_SECRET` already set with the correct value** (byte-equals `HELM_TELLER_WEBHOOK_SECRET` in helm-secrets — compared without echoing). Matt/Wolfgang must have set it pre-cycle. Default action state confirmed. Annotated TA's approval with the finding + filed FYI handoff `queue/tech-architect-pending/ops-2026-05-24-0150-teller-webhook-secret-set.md`. **Action 2 (rotation) stays open for Matt** — TA's sequencing argument (rotate-then-scrub-inbox; scrub-before-rotate loses the reference). Validated end-to-end that the documented Ops Railway-CLI runbook flow works in production. Seconded TA's fleet-norm note: secrets should not transit `agents/*/inbox.md`.
  • - Third work cycle: extended the Teller-webhook drift check into a full **pre-launch sweep** of every `HELM_*` in helm-secrets vs bare-name on Railway helm-web prod (SHA-256 compared, no values echoed). **Found 1 mismatch + 5 missing.** Mismatch: `STRIPE_SECRET_KEY` — helm-secrets and Railway have different values, direction unknown, launch-shaped risk. Missing on Railway: `STRIPE_RESTRICTED_KEY` (referenced in my rotate-stripe-key runbook — needs TA grep to confirm if app consumes), `BUNNY_STORAGE_HOSTNAME` + `BUNNY_STORAGE_READONLY_KEY` (unclear whether app needs), `MASTER_KEY` (probably redundant with Railway's `RAILS_MASTER_KEY`), `LINEAR_API_KEY` (correctly agent-only). 16 keys match clean. Filed `queue/tech-architect-pending/ops-2026-05-24-0210-railway-vs-helm-secrets-drift-sweep.md` (full report) + `queue/approvals/ops-2026-05-24-0210-stripe-secret-key-drift.md` (Matt: pick A/B/C, default = rotate since pre-launch is cheapest, mirrors Teller pattern). Did not touch any value — direction is unknown and Stripe is on the "Matt decides alone" list (banking).
  • - Fourth work cycle: processed TA's drift-sweep answers (`queue/ops-pending/tech-architect-2026-05-24-0240-...`). TA confirmed 3 of 5 missing keys are orphans (HELM_MASTER_KEY redundant with RAILS_MASTER_KEY, two BUNNY_STORAGE_* unused) and STRIPE_RESTRICTED_KEY is not consumed by the live app (codebase consolidated on STRIPE_SECRET_KEY). Applied all three hygiene actions: (1) rewrote `specs/ops-runbooks/rotate-stripe-key.md` to drop STRIPE_RESTRICTED_KEY references + added a safe-edit sops pattern for helm-secrets rotations (input-type yaml + tmpfs + decrypt-verify-before-swap, developed during this cycle); (2) dropped the 3 orphan keys from `helm-secrets/helm-prod.yaml` via the new pattern (22 → 19 HELM_* keys, round-trip verified, committed Turbo-Puffin/helm-secrets `8bc9fcc`); (3) noted in TA's pending reply. STRIPE_SECRET_KEY mismatch still gated on Matt's approval. Dropped TA's pending file; filed `queue/tech-architect-pending/ops-2026-05-24-0330-drift-sweep-hygiene-applied.md` summarizing.
  • - 03:10 UTC: #54 rspec failure root cause — rebase mis-merged the spec file. The `"refuses to cancel a lifetime plan"` test landed inside `describe "POST /subscription/portal"` (which signs in `active_user`, a separate let), so the `user.update!(subscription_plan: "lifetime")` was on a different user from the signed-in one. The test then exercised the real destroy path against `active_user` (no lifetime guard fired). Also caught a second mis-merge: the `"cancels the subscription"` test was dropped entirely in the rebase. Restored both tests in a proper `describe "DELETE /subscription"` block. Force-pushed `hon-340-stripe-portal` → `b733917`.
  • PR #76 merged 06:35 UTC but no log entry shows Ops actually re-ran the seed. Without it, the prod `ReconciliationTemplate "standard"` row still has the OLD body (no UPL paragraph in the letter itself; M2 web banner from PR #81 is independent but the printed/PDF letter body reads from the template). Filed `queue/ops-pending/tech-architect-2026-05-24-1340-rails-db-seed-on-prod.md` with one-liner + verify command. Tracked as TA task #16.
  • - **Yesterday:** provisioning + autonomy upgrade applied; drafted vendor inventory (12 vendors tiered), 5 launch-week runbooks (rotate-stripe, rotate-teller, redeploy-after-bad-merge, postgres-incident, status-page proposal); batched 6 fillin asks to Matt; logged hygiene fix (Sentry→Better Stack sweep across all my docs).
  • - **Today:** Drift sweep + live-API smoke caught 3 launch-blockers (Stripe key drift, dead Anthropic key, no helmhelps.com Resend domain). All 3 closed via Matt rotations + DNS work; `ops/bin/credential-smoke.sh` now exits 0 across Stripe/Resend/Anthropic/OpenAI/Bunny/Linear. Wolfgang shipped backup artifacts (pg_dump + age-keys-bundle); both runbooks updated with live paths. Filed `ops/launch-day-watchlist.md`. Dropped 3 orphan keys from helm-secrets. TA's Better Stack token approval routed to Wolfgang.
  • - **Blockers:** Matt's native Railway pre-launch snapshot still open (Wolfgang to re-raise Mon EOD). Recovery dry-run on a non-prod user (age-key-recovery TODO #2) — not launch-shaped, post-launch.
Yesterday
  • - Autonomy upgrade 2026-05-23 received. Re-armed crons against new cadence (73m work cycle, 13:11 standup). Re-read charter.
  • - First work cycle (manual trigger from Matt): filed `specs/ops-vendor-inventory.md` covering 12 vendors (Stripe, Teller, Resend, Neon, Railway, helmhelps.com registrar, GitHub, Bunny, Anthropic, Hetzner, Linear, Measure) plus Cloudflare/status-page/support-inbox provisional entries. Tiered by launch impact (T0 launch-critical / T1 important / T2 utility). Flagged 5 TODO gaps for Matt: domain registrar identity + expiry, Neon retention SOP, master credential vault (recommend 1Password Business), age private-key backup, Cloudflare-in-stack confirmation. Did NOT duplicate the support-inbox proposal — Customer Support already filed `specs/support-launch.md`; Ops will add a vendor/cost addendum instead. Next cycles: runbooks (rotate-stripe / rotate-teller / redeploy-after-bad-merge / postgres-incident) + status-page proposal.

Recent Commits

0cd00e8 4 hours ago ops: M1 migration deployed (PR #84); ops-pending cleared
446eddf 5 hours ago ops: rails db:seed handoff blocked — handed back to TA with paths
7fcf2bb 5 hours ago ops: daily standup 2026-05-24
cf0db30 6 hours ago ops: offer to take Better Stack token wiring off Matt's plate
97c6b7e 7 hours ago ops: Resend helmhelps.com VERIFIED; all 6 credentials green
cf67045 8 hours ago ops: Matt rotated Anthropic + Stripe; verified via smoke + drift
f76acb7 11 hours ago ops: refresh vendor inventory to current state
dd6ba5f 12 hours ago ops: ack Wolfgang's backup artifacts, drop runbook TODOs
a3b83af 13 hours ago ops: escalation re Wolfgang's missing pg_dump + age-keys bundle
d47678d 14 hours ago ops: ship credential-smoke.sh helper; processed TA + Marketing replies
7d96794 15 hours ago ops: live-API smoke caught 2 launch-blockers + launch-day watchlist
93e672e 16 hours ago ops: apply TA drift-sweep hygiene findings
9b5f850 17 hours ago ops: pre-launch helm-secrets vs Railway drift sweep
3c13bdb 18 hours ago ops: claim TA's Teller-webhook Action 1 (already in correct state)
adad9e1 19 hours ago ops: apply Wolfgang's 6-answer batch + draft age-key-recovery runbook
69b2e0b 20 hours ago ops: align runbooks + inventory with Better Stack Logs pick
34c39bb 21 hours ago ops: status-page proposal + batched vendor-inventory approval
3f69216 22 hours ago ops: launch-week runbooks + price fix
c9ef0bc 22 hours ago ops: vendor inventory + work-cycle log entry
4d81944 2 days ago bootstrap ops lane

Live Screen


✻ Running scheduled task (May 24 5:27pm)

  Ran 1 shell command

● Same state. No inbox additions, no ops-pending, no open approvals. Exiting
  quietly.

✻ Sautéed for 17s

✻ Running scheduled task (May 24 6:27pm)

  Read 1 file, ran 1 shell command

● That's a Marketing approval for Matt — not for me. No ops-pending, no inbox
  additions. State unchanged from prior cycles; exiting quietly.

✻ Cooked for 14s

────────────────────────────────────────────────────────────────────────────────
❯ <suggestion skip />
────────────────────────────────────────────────────────────────────────────────
  ⏵⏵ bypass permissions on (shift+tab to cycle) · ← for agents