honeyDueAPI/deploy/DEPLOYING.md

# Deploying Right Now

Practical walkthrough for a prod deploy against the current Swarm stack.
Assumes infrastructure and cloud services already exist — if not, work
through [`shit_deploy_cant_do.md`](./shit_deploy_cant_do.md) first.

See [`README.md`](./README.md) for the reference docs that back each step.

---

## 0. Pre-flight — check local state

```bash
cd honeyDueAPI-go

git status                # clean working tree?
git log -1 --oneline      # deploying this SHA

ls deploy/cluster.env deploy/registry.env deploy/prod.env
ls deploy/secrets/*.txt deploy/secrets/*.p8
```

## 1. Reconcile your envs with current defaults

These two values **must** be right — the script does not enforce them:

```bash
# deploy/cluster.env
WORKER_REPLICAS=1          # >1 → duplicate cron jobs (Asynq scheduler is a singleton)
PUSH_LATEST_TAG=false      # keeps prod images SHA-pinned
SECRET_KEEP_VERSIONS=3     # optional; 3 is the default
```

Decide storage backend in `deploy/prod.env`:

- **Multi-replica safe (recommended):** set all four of `B2_ENDPOINT`,
  `B2_KEY_ID`, `B2_APP_KEY`, `B2_BUCKET_NAME`. Uploads go to B2.
- **Single-node ok:** leave all four empty. Script will warn. In this
  mode you must also set `API_REPLICAS=1` — otherwise uploads are
  invisible from 2/3 of requests.

## 2. Dry run

```bash
DRY_RUN=1 ./.deploy_prod
```

Confirm in the output:
- `Storage backend: S3 (...)` OR the `LOCAL VOLUME` warning matches intent
- `Replicas: api=3, worker=1, admin=1` (or `api=1` if local storage)
- Image SHA matches `git rev-parse --short HEAD`
- `Manager:` host is correct
- `Secret retention: 3 versions`

Fix envs and re-run until the plan looks right. Nothing touches the cluster yet.

## 3. Real deploy

```bash
./.deploy_prod
```

Do **not** pass `SKIP_BUILD=1` after code changes — the worker's health
server and `MigrateWithLock` both require a fresh build.

End-to-end: ~3–8 minutes. The script prints each phase.

## 4. Post-deploy verification

```bash
# Stack health (replicas X/X = desired)
ssh <manager> docker stack services honeydue

# API smoke
curl -fsS https://api.<domain>/api/health/ && echo OK

# Logs via Dozzle (loopback-bound, needs SSH tunnel)
ssh -p <port> -L 9999:127.0.0.1:9999 <user>@<manager>
# Then browse http://localhost:9999
```

What the logs should show on a healthy boot:
- `api`: exactly one replica logs `Migration advisory lock acquired`,
  the others log `Migration advisory lock acquired` after waiting, then
  `released`.
- `worker`: `Health server listening addr=:6060`, `Starting worker server...`,
  four `Registered ... job` lines.
- No `Failed to connect to Redis` / `Failed to connect to database`.

## 5. If it goes wrong

Auto-rollback triggers when `DEPLOY_HEALTHCHECK_URL` fails — every service
is rolled back to its previous spec, script exits non-zero.

Triage:

```bash
ssh <manager> docker service logs --tail 200 honeydue_api
ssh <manager> docker service ps honeydue_api --no-trunc
```

Manual rollback (if auto didn't catch it):

```bash
ssh <manager> bash -c '
  for svc in $(docker stack services honeydue --format "{{.Name}}"); do
    docker service rollback "$svc"
  done'
```

Redeploy a known-good SHA:

```bash
DEPLOY_TAG=<older-sha> SKIP_BUILD=1 ./.deploy_prod
# Only valid if that image was previously pushed to the registry.
```

## 6. Pre-deploy honesty check

Before pulling the trigger:

- [ ] Tested Neon PITR restore (not just "backups exist")?
- [ ] `WORKER_REPLICAS=1` — otherwise duplicate push notifications next cron tick
- [ ] Cloudflare-only firewall rule on 80/443 — otherwise origin IP is on the public internet
- [ ] If storage is LOCAL, `API_REPLICAS=1` too
- [ ] Last deploy's secrets still valid (rotation hasn't expired any creds)