Web Docker Deployment

This is the operational guide for the Docker-based apps/web runtime and the parallel apps/tanstack-web migration runtime.

Files That Define The Stack

apps/web/Dockerfile
apps/web/docker/blue-green-watcher.Dockerfile
apps/web/docker/cron-runner.Dockerfile
apps/web/cron.config.json
apps/tanstack-web/Dockerfile
apps/backend/Dockerfile
apps/meet-realtime/Dockerfile
apps/supermemory/Dockerfile
docker-compose.web.yml
docker-compose.web.prod.yml (Compose include entry that merges docker-compose/compose.web.prod.*.yml fragments plus shared secrets / volumes)
scripts/sync-web-crons.js
scripts/watch-web-crons.js
scripts/docker-web.js
scripts/check-docker-web.js

Supported Commands

Command	Purpose
`bun dev:web:docker`	Run the web dev workflow inside Docker
`bun devx:web:docker`	Explicitly start local Supabase, then the Docker dev workflow
`bun devrs:web:docker`	Explicitly start and reset local Supabase, then the Docker dev workflow
`bun dev:web:docker:down`	Stop the Docker dev workflow
`bun serve:web:docker`	Build and run the production web image in-place
`bun serve:web:docker:bg`	Blue/green production deploy with health-checked cutover
`bun serve:web:docker:bg:watch`	Recreate the watcher container, then tail its live logs while it polls the tracked branch and auto-runs blue/green after a successful fast-forward pull
`bun serve:web:docker:down`	Stop the production Docker stack
`bun serve:web:docker:bg:down`	Stop the blue/green stack and clear local runtime state
`bun test:e2e` / `bun test:e2e:web:docker`	Start local Supabase, reset it, run the production blue/green Docker web stack, then run Playwright
`bun benchmark:web-setups`	Compare reachable Next.js, TanStack Start, and Rust backend routes and write reports under `tmp/benchmarks/web-migration/`
`bun migration:tanstack:gates`	Validate route parity, Docker E2E evidence, and benchmark evidence before TanStack/Rust cutover
`bun check:docker`	Validate Dockerfile and compose parity rules
`bun check:cloudflare`	Validate TanStack Start and Rust backend Wrangler deploy configs without contacting Cloudflare

Flags And Implicit Mappings

Flag	Meaning
`--without-redis`	Disable the bundled Redis profile and skip Docker-injected Redis env
`--with-cloudflared`	Enable the bundled Cloudflare Tunnel container profile
`--with-supabase`	Start local Supabase before the Docker web flow
`--reset-supabase`	Start and reset local Supabase before the Docker web flow
`--env-file tmp/e2e/web.env`	Use an explicit Docker web env file for build secrets and runtime env files
`--mode prod`	Use the production compose file instead of the dev stack
`--strategy blue-green`	Use blue/green production deployment instead of in-place replacement
`--profile redis`	Explicitly enable the Redis profile when calling the helper directly
`--profile cloudflared`	Explicitly enable the Cloudflare Tunnel profile when calling the helper directly
`--build-memory 4g`	Run builds through a capped Buildx builder with a memory ceiling
`--build-cpus 4`	Run builds through a capped Buildx builder with an approximate CPU limit
`--build-max-parallelism 2`	Limit concurrent BuildKit solve steps for lower build pressure
`--build-builder-name tuturuuu`	Override the throttled Buildx builder name
`--resume-if-running`	If another watcher PID already holds the lock, mirror its live dashboard instead of failing
`--replace-existing`	If another watcher PID already holds the lock, stop it and take over
`--if-locked <fail\|resume\|replace>`	Explicit lock-conflict policy for the watcher

Command	Implicit flags
`bun dev:web:docker`	none
`bun devx:web:docker`	`--with-supabase`
`bun devrs:web:docker`	`--reset-supabase`
`bun serve:web:docker`	`--mode prod`
`bun serve:web:docker:bg`	`--mode prod --strategy blue-green`
`bun dev:web:docker -- --without-redis`	`--without-redis`

Runtime Requirements

.env.local should be the primary Docker env file. The helper still falls back to apps/web/.env.local for older hosts that have not moved their env yet.
When --env-file is provided, the Docker helper uses that file for the Dockerfile secret and for the Compose runtime env_file entries. This keeps special-purpose runs such as E2E from accidentally inheriting a developer’s cloud Supabase .env.local.
Production Compose fragments live under docker-compose/, so their relative host paths must be written from that directory. Use .. to reach the repo root for build contexts, env files, and bind mounts; otherwise Docker Compose resolves paths like apps/... as docker-compose/apps/... and the watcher image fails before the deployment loop starts.
Docker BuildKit must be available. The helper sets COMPOSE_DOCKER_CLI_BUILD=1, DOCKER_BUILDKIT=1, and BUILDX_NO_DEFAULT_ATTESTATIONS=1 so local blue/green image exports do not stall while resolving default provenance metadata.
Web, Hive, and TanStack Docker builder stages mount /workspace/.turbo so local Turbo hits survive across Docker builds. They also read optional BuildKit secrets named turbo_token, turbo_team, turbo_api, and turbo_remote_cache_signature_key only for build RUN steps. When those secrets are absent, builds fall back to the mounted local cache.
GitHub-hosted Docker verification and E2E use service-specific type=gha BuildKit scopes so the same backend or frontend layers can be reused across workflows. Every shard may restore. Only shard 1 on the default branch may export a true miss, with a bounded timeout and ignore-error=true, preventing parallel shards from competing to save the same scope. Use mode=max for the expensive web, TanStack, and backend scopes; use mode=min or restore-only caching for small leaf services.
Manual docker buildx and Compose invocations do not receive the GitHub cache runtime variables automatically. Every CI job that uses type=gha outside docker/build-push-action must first run the pinned crazy-max/ghaction-github-runtime step so ACTIONS_RUNTIME_TOKEN and ACTIONS_RESULTS_URL are available to BuildKit. Keep the action pinned to a full commit SHA and covered by the CI cache-policy tests.
Forward GitHub remote-cache identity to Docker only with BuildKit --secret mounts. Never pass Turbo tokens as build arguments, Compose environment baked into an image, labels, or files copied into a layer. Pull requests and Dependabot builds receive no remote token and continue with local BuildKit and Turbo fallback behavior.
The production and TanStack-dual Compose files accept restore entries through DOCKER_WEB_CACHE_<SERVICE>_FROM and optional exporter entries through the matching _TO variable. Supported service tokens are WEB, TANSTACK, BACKEND, HIVE, SUPERMEMORY, MARKITDOWN, STORAGE_UNZIP, CHAT_REALTIME, HIVE_REALTIME, and MEET_REALTIME. CI configures shared type=gha scopes; local operators normally leave these variables unset.
DOCKER_WEB_TURBO_TEAM_SECRET_FILE and DOCKER_WEB_TURBO_TOKEN_SECRET_FILE point Compose at short-lived files used as BuildKit secrets. With no configured credentials they resolve to the committed zero-byte docker-compose/empty-secret, so local and cross-platform Compose parsing remains inert. Never put a token in that placeholder.
Rust backend cache cleanup is local maintenance, not a separate cache service. bun rust-cache report prints current apps/backend/target usage, bun rust-cache prune --apply removes stale or oversized target entries, and bun rust-cache auto runs at most once every 24 hours by default using tmp/rust-cache/state.json. The default policy is local-only, skips CI, keeps target entries newer than 14 days when possible, and starts pruning when the repo-owned target cache exceeds 20 GiB.
The dependency stages in apps/web/Dockerfile, apps/hive/Dockerfile, apps/hive-realtime/Dockerfile, apps/meet-realtime/Dockerfile, apps/chat-realtime/Dockerfile, and apps/supermemory/Dockerfile must copy every apps/*/package.json and packages/*/package.json manifest before running any frozen Bun install. Filtered Bun installs in these production images are retry-wrapped and clear the Bun install cache before retry; keep those snippets in sync with scripts/check-docker-web.js. Adding a new workspace app or package without updating those lists makes Docker-only installs try to rewrite bun.lock. bun check:docker validates this manifest parity. apps/backend is an independent Rust crate, and apps/meet-realtime is intentionally not a workspace package; their Dockerfiles do not require bun.lock changes when service source changes.
The Docker web flow does not start local Supabase unless you explicitly choose bun devx:web:docker or bun devrs:web:docker.
Production Docker serving commands (bun serve:web:docker, bun serve:web:docker:bg, and bun serve:web:docker:bg:watch) prefer root .env.local when it exists, even if the shell inherited stale DOCKER_WEB_ENV_FILE or DOCKER_WEB_COMPOSE_*ENV_FILE values. Passing --env-file <path> is the explicit way to use another deployment env file.
ttr box setup intentionally writes local Supabase values into app-local env files such as apps/web/.env.local. Those files are valid for devbox/local work but must not be the effective production watcher env. Keep root .env.local or the explicit deployment env file pointed at the cloud Supabase project.
Production Docker serving refuses local Supabase origins (localhost, 127.0.0.1, ::1, host.docker.internal, and local Supabase ports) unless DOCKER_WEB_ALLOW_LOCAL_SUPABASE=1 is set for a local production-image rehearsal.
log-drain-postgres stores deployment telemetry only. Production startup explicitly starts it from scripts/docker-web.js and retries once after removing only the service container, but web and watcher services do not declare a Compose depends_on relationship to it. An unhealthy log-drain database no longer blocks web promotion by default: the helper continues with PLATFORM_LOG_DRAIN_ENABLED=false, prints recent service state/log diagnostics, and leaves the platform-log-drain-postgres volume intact. Set DOCKER_WEB_LOG_DRAIN_REQUIRED=1 only when telemetry storage must be a hard promotion gate. If diagnostics mention incompatible database files or data directory corruption, back up or migrate the Compose volume first; do not run docker compose down --volumes or remove the volume without explicit operator approval.
Non-production Docker helpers still rewrite an explicitly local server-side Supabase URL to host.docker.internal while leaving NEXT_PUBLIC_SUPABASE_URL alone for browsers.
Dockerized web services set __NEXT_PRIVATE_ORIGIN from DOCKER_WEB_NEXT_PRIVATE_ORIGIN, defaulting to http://127.0.0.1:7803. This keeps Next.js Server Action forwarding on the in-container web listener even when nginx preserves an external Host. If logs show failed to forward action response or UND_ERR_HEADERS_TIMEOUT, verify the running web container has __NEXT_PRIVATE_ORIGIN=http://127.0.0.1:7803 or an intentional internal override. Do not use serverActions.allowedOrigins as the primary fix for this symptom; that setting controls Server Action origin/host validation, not the forwarded-action fetch URL.
If logs still show Error checking if workspace is personal with [locale], verify the running image includes commit b30d7e2b07 or newer plus the shared @tuturuuu/utils UUID guard.
Dockerized web commands auto-enable the local Redis companion stack and inject UPSTASH_REDIS_REST_URL plus a generated UPSTASH_REDIS_REST_TOKEN into the web container.
Dockerized production commands generate BACKEND_INTERNAL_TOKEN when one is not provided and inject BACKEND_INTERNAL_URL=http://backend:7820 for the Rust backend service. Dev Compose uses the same internal URL with a local fallback token. The same Rust HTTP core is prepared for later Cloudflare Workers deployment through apps/backend/wrangler.jsonc; validate the Worker deploy contract with bun check:cloudflare.
apps/backend/Dockerfile copies apps/tanstack-web/migration/route-manifest.json before cargo build because the Rust migration endpoints include that checked manifest at compile time. Regenerate the manifest before Docker validation when legacy route inventory changes.
The TanStack migration services use apps/tanstack-web/Dockerfile, direct host port 7824, and the Portless browser origin https://tanstack.tuturuuu.localhost:1355. Production Compose defines tanstack-web, tanstack-web-blue, and tanstack-web-green so benchmark and cutover checks can run beside the legacy Next.js lanes. Docker web E2E aliases the TanStack Portless route to the web-proxy host port 7803 in DOCKER_WEB_FRONTEND=tanstack mode, because the blue/green TanStack lanes are exposed through nginx rather than the standalone tanstack-web host port. Production TanStack services wait on the Rust backend service_healthy check before starting, because the Start runtime uses BACKEND_INTERNAL_URL=http://backend:7820 for server-owned API calls. The TanStack image and production Compose healthchecks run apps/tanstack-web/docker/healthcheck.mjs, which requires the local TanStack HTTP runner to answer below 500 and the configured BACKEND_INTERNAL_URL to pass /healthz. The TanStack Node runner also answers /__platform/drain-status directly so the shared nginx web-proxy healthcheck can use the same internal readiness path in DOCKER_WEB_FRONTEND=tanstack mode. Host-side Docker E2E readiness probes the TanStack root route instead, because nginx denies external access to the internal drain-status path. Use DOCKER_WEB_FRONTEND=next|tanstack to select the default web frontend in migration-aware scripts. The default is still next; setting tanstack keeps nginx on the public web-proxy:7803 listener but routes the active upstream to tanstack-web-blue:7824 or tanstack-web-green:7824. Set the variable in the host/root deployment env before creating web-blue-green-watcher, web-docker-control, or web-cron-runner, because those containers inherit it for status probes, cached recovery, and service recreation. Cloudflare preview deploys use apps/tanstack-web/wrangler.jsonc plus the app-local deploy:cloudflare, deploy:cloudflare:dry-run, preview:cloudflare, and cf-typegen scripts, so incremental TanStack route ports can be validated on Workers before the full cutover gate passes.
Cloudflare preview deployment is separate from the Docker blue/green production stack. Use it to prove Worker compatibility for apps/tanstack-web and apps/backend; do not route the production hostname to preview Workers until the TanStack/Rust manifest, compare-mode Docker E2E, benchmark report, and cutover gates pass.
The production stack runs the first-party AI memory sidecar as an internal support service at http://supermemory:8787. The service name and SUPERMEMORY_* env names stay compatible with existing web runtime wiring, but apps/supermemory/Dockerfile builds Tuturuuu-owned pgvector memory code.
Dockerized production commands auto-configure the memory sidecar unless explicitly disabled. scripts/docker-web/env.js generates and persists the internal SUPERMEMORY_API_KEY, SUPERMEMORY_POSTGRES_PASSWORD, and SUPERMEMORY_DATABASE_URL, and defaults SUPERMEMORY_ENABLED=true, SUPERMEMORY_FAIL_OPEN=true, and SUPERMEMORY_TIMEOUT_MS=1500.
Operators can override generated values with DOCKER_SUPERMEMORY_API_KEY, DOCKER_SUPERMEMORY_POSTGRES_PASSWORD, DOCKER_SUPERMEMORY_DATABASE_URL, or DOCKER_SUPERMEMORY_ENABLED; standard SUPERMEMORY_* env still works.
Blue/green promotion health-gates supermemory with the rest of the support services. Changing apps/supermemory/, the production Compose fragments, or the Docker bake file refreshes the support service set. Explicit SUPERMEMORY_ENABLED=false or DOCKER_SUPERMEMORY_ENABLED=false removes that support service from blue/green builds, starts, and health gates for local-only runs.

Cloudflare Preview Path

Docker remains the production rollout and rollback mechanism while the TanStack/Rust migration is incomplete. Cloudflare Workers are available now for incremental preview validation:

Run the config preflight without contacting Cloudflare:
bun check:cloudflare
Keep local Worker secret values in apps/backend/.dev.vars and apps/tanstack-web/.dev.vars; those files are ignored by .gitignore. wrangler.jsonc may name required secrets, but it must not contain secret values or account-specific private origins.

Deploy the Rust backend Worker first:

rustup target add wasm32-unknown-unknown
cargo install worker-build --locked
bun wrangler secret put BACKEND_INTERNAL_TOKEN --config apps/backend/wrangler.jsonc
bun wrangler secret put TUTURUUU_APP_COORDINATION_SECRET --config apps/backend/wrangler.jsonc
bun wrangler secret put SUPABASE_URL --config apps/backend/wrangler.jsonc
bun wrangler secret put SUPABASE_SERVICE_ROLE_KEY --config apps/backend/wrangler.jsonc
bun wrangler secret put CRON_SECRET --config apps/backend/wrangler.jsonc
bun wrangler secret put DISCORD_APP_DEPLOYMENT_URL --config apps/backend/wrangler.jsonc
bun wrangler secret put AURORA_EXTERNAL_URL --config apps/backend/wrangler.jsonc
bun wrangler secret put AURORA_EXTERNAL_WSID --config apps/backend/wrangler.jsonc
bun wrangler deploy --config apps/backend/wrangler.jsonc

SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY are required for the Rust-owned contact/profile APIs. The backend uses them only server-side to read users and user_private_details and insert support_inquiries through Supabase REST. CRON_SECRET and DISCORD_APP_DEPLOYMENT_URL are required for Rust-owned Discord cron proxy preview readiness, and AURORA_EXTERNAL_URL with AURORA_EXTERNAL_WSID is required for the Rust-owned Aurora health and ingest probes; configure them even when the first smoke target is only /healthz. Use wrangler secret put for first preview bootstrap only. It creates and deploys a new active Worker version when the secret changes. For rotations or canary traffic, use:

bun wrangler versions secret put BACKEND_INTERNAL_TOKEN --config apps/backend/wrangler.jsonc
bun wrangler versions secret put TUTURUUU_APP_COORDINATION_SECRET --config apps/backend/wrangler.jsonc
bun wrangler versions secret put SUPABASE_URL --config apps/backend/wrangler.jsonc
bun wrangler versions secret put SUPABASE_SERVICE_ROLE_KEY --config apps/backend/wrangler.jsonc
bun wrangler versions secret put CRON_SECRET --config apps/backend/wrangler.jsonc
bun wrangler versions secret put DISCORD_APP_DEPLOYMENT_URL --config apps/backend/wrangler.jsonc
bun wrangler versions secret put AURORA_EXTERNAL_URL --config apps/backend/wrangler.jsonc
bun wrangler versions secret put AURORA_EXTERNAL_WSID --config apps/backend/wrangler.jsonc
bun wrangler versions deploy --config apps/backend/wrangler.jsonc

Bind the TanStack Worker to the backend Worker by service binding and keep secret values out of source. apps/tanstack-web/wrangler.jsonc declares the BACKEND service binding to tuturuuu-backend; BACKEND_INTERNAL_URL is only an optional HTTP fallback for local or emergency non-binding runs:

bun wrangler secret put BACKEND_PUBLIC_ORIGIN --config apps/tanstack-web/wrangler.jsonc
bun wrangler secret put BACKEND_INTERNAL_TOKEN --config apps/tanstack-web/wrangler.jsonc

Use the versions flow for TanStack Worker secret rotations too:

bun wrangler versions secret put BACKEND_PUBLIC_ORIGIN --config apps/tanstack-web/wrangler.jsonc
bun wrangler versions secret put BACKEND_INTERNAL_TOKEN --config apps/tanstack-web/wrangler.jsonc
bun wrangler versions deploy --config apps/tanstack-web/wrangler.jsonc

Deploy the TanStack Start Worker:

bun --cwd apps/tanstack-web run cf-typegen
bun --cwd apps/tanstack-web run deploy:cloudflare

Smoke both returned Worker origins:
BACKEND_INTERNAL_TOKEN="${BACKEND_INTERNAL_TOKEN:?set BACKEND_INTERNAL_TOKEN}" \ BACKEND_WORKER_ORIGIN=https://<backend-worker-origin> \ TANSTACK_WEB_WORKER_ORIGIN=https://<tanstack-worker-origin> \ bun smoke:cloudflare
Keep BACKEND_INTERNAL_TOKEN in the shell, ignored local env, or Wrangler secret storage only. The smoke report must include the positive authenticated migration-status probe and the missing/invalid-token rejection probes before the Cloudflare cutover gate can pass.

Security expectations for this path:

Keep BACKEND_INTERNAL_TOKEN, backend origins, and future service credentials in Wrangler/Cloudflare bindings or ignored local env only. wrangler.jsonc may list variable names and non-secret preview defaults, never secret values.
Browser code must not receive backend bearer tokens. Protected workspace, private schema, cron, job, and admin routes stay server-owned through Rust endpoints and packages/internal-api / TanStack server functions.
Re-check CORS, cookie domain, secure-cookie, SameSite, and session-origin behavior before adding a custom hostname; workers.dev, local Portless, and production hosts are different browser origins.
Keep Cloudflare preview in BACKEND_ENV=preview; development-only migration bypasses are local-only.

Rollback from this preview path is DNS/routing-based: remove the Cloudflare route or custom domain that points at the preview Worker, or redeploy the last known-good Worker version, while the Docker blue/green apps/web production stack keeps serving the canonical hostname. Do not delete Worker secrets unless the secret value is compromised. Use Wrangler to inspect and roll back preview Worker versions:

bun wrangler deployments list --config apps/backend/wrangler.jsonc
bun wrangler deployments status --config apps/backend/wrangler.jsonc
bun wrangler rollback <VERSION_ID> --config apps/backend/wrangler.jsonc

bun wrangler deployments list --config apps/tanstack-web/wrangler.jsonc
bun wrangler deployments status --config apps/tanstack-web/wrangler.jsonc
bun wrangler rollback <VERSION_ID> --config apps/tanstack-web/wrangler.jsonc

Worker rollback does not revert external resources, bindings, routes, custom domains, or secret values. Re-run bun smoke:cloudflare against the remaining preview origins before resuming canary traffic.

Dockerized E2E

bun test:e2e from the repo root and bun test:e2e in apps/web run through scripts/run-web-e2e-docker.js instead of starting next dev. The runner:

writes tmp/e2e/web.env with local-only Supabase, local app-origin variables, app-session JWT values, and a local-only E2E auth bypass for Turnstile/dev-session,
starts and resets the Dockerized local Supabase stack,
boots apps/web through the production blue/green Docker flow,
starts Portless on unprivileged HTTPS port 1355 and registers the https://tuturuuu.localhost:1355 route only after the direct Docker proxy is healthy,
waits for https://tuturuuu.localhost:1355/login, then runs Playwright against that shared-cookie origin, and
tears down Docker web plus local Supabase unless E2E_KEEP_DOCKER_STACK=1.

Pass --frontend next, --frontend tanstack, or --frontend compare to select the legacy Next.js host, the TanStack host, or both sequentially:

bun test:e2e:web:docker -- --frontend compare

The TanStack mode reuses the existing apps/web/e2e suite and points Playwright at https://tanstack.tuturuuu.localhost:1355. In Docker web E2E, that Portless host points at the web-proxy host port because the runner validates the production blue/green TanStack lane selected by DOCKER_WEB_FRONTEND=tanstack. Standalone TanStack checks such as docker-compose.tanstack-dual.yml still use the direct TanStack Docker port 7824. Compare mode writes tmp/e2e/web-migration/compare-report.json after both frontend runs complete. Set E2E_COMPARE_REPORT_PATH to redirect that evidence file under another ignored tmp/ location. The report records the normalized Next and TanStack frontend origins used by the run plus per-frontend Playwright test counts from JSON reporter output. The cutover gate rejects missing, credentialed, invalid, or same-origin compare evidence, and it also rejects reports that do not prove nonzero Playwright execution for both frontends before they can satisfy terminal migration gates. Before cutover, pair the compare-mode E2E evidence with a full benchmark report and Cloudflare smoke report:

BACKEND_INTERNAL_TOKEN="${BACKEND_INTERNAL_TOKEN:?set BACKEND_INTERNAL_TOKEN}" \
bun benchmark:web-setups -- \
  --setup compare \
  --profile full \
  --require-all \
  --next-origin https://<legacy-next-origin> \
  --tanstack-origin https://<tanstack-worker-origin> \
  --backend-origin https://<backend-worker-origin>

The benchmark report is cutover evidence, so the origins are part of the gate. The benchmark command rejects same-origin Next/TanStack compare runs, and the cutover gate rejects reports with missing origins or route/sample URLs that do not match the recorded setup origin. This catches accidental proxy reuse before the report can satisfy migration gates.

bun migration:tanstack:gates -- \
  --e2e-report tmp/e2e/web-migration/compare-report.json \
  --benchmark-report tmp/benchmarks/web-migration/<timestamp>/report.json \
  --cloudflare-smoke-report tmp/benchmarks/web-migration/<timestamp>/cloudflare-smoke.json \
  --output tmp/benchmarks/web-migration/<timestamp>/cutover-gates.json

The gate command does not start Docker. It only validates explicit report files so generated evidence remains under ignored tmp/ paths. The output JSON is the review/handoff artifact that links the Docker E2E, benchmark, and Cloudflare smoke evidence for sign-off without committing generated reports. Normal teardown passes --volumes --rmi local to Docker Compose and then removes custom image tags for the current ttr-e2e-* project, so per-run containers, Compose volumes, and baked blue/green images do not accumulate. E2E also sets DOCKER_WEB_BUILDKIT_PRUNE_AFTER_BUILD=1 and DOCKER_WEB_BUILDKIT_PRUNE_MODE=all by default because the per-run BuildKit cache/state is disposable; set E2E_DOCKER_BUILDKIT_PRUNE_AFTER_BUILD=0 only when debugging a local E2E build and you intentionally want to keep BuildKit cache. Local E2E also starts Supabase with edge-runtime excluded by default through DOCKER_WEB_SUPABASE_START_EXCLUDE=edge-runtime. The platform E2E suite does not serve local Edge Functions, and excluding that service keeps local runs from failing when the Supabase Edge Runtime tries to resolve external JSR packages. Set E2E_SUPABASE_START_EXCLUDE= when you intentionally need a full local Supabase stack for debugging. Local E2E also pins DOCKER_SUPERMEMORY_ENABLED=false and SUPERMEMORY_ENABLED=false; the memory integration is not under test there. E2E build caps default to auto for memory, CPU, and BuildKit max parallelism. The runner reads Docker’s current MemTotal before booting the stack, forwards that value as DOCKER_WEB_DOCKER_MEMORY_LIMIT, and resolves the BuildKit memory cap just under the active Docker Desktop allocation. On allocations below 10 GB, the E2E runner keeps the inner Next build at one CPU, static generation concurrency one, and a 4 GB Node heap so BuildKit keeps enough container headroom. The Next build engine remains Turbopack. Do not switch local E2E runs to the Webpack build path; the production and local Docker build paths are expected to exercise the same Turbopack compiler. For blue/green web deploys, the helper defaults to the Docker/BuildKit web image build path so production builds keep their container isolation boundary. Set DOCKER_WEB_NATIVE_BUILD=1 only for an explicit operator-approved native host build. That opt-in runs bun run build:web:docker on the host with DOCKER_WEB_STANDALONE=1, packages apps/web/.next/standalone plus static assets into the same Node runtime image shape, and continues with Docker Compose startup and Playwright. Native builds derive their host-side build memory budget from the machine’s total memory, while leaving Next/Turbopack CPU and static generation worker counts unset so the toolchain can auto-configure from the real host. Set DOCKER_WEB_NATIVE_BUILD_MEMORY=16g or another explicit value only when the host needs a different Node heap bucket. Native runner packaging uses plain docker build by default so a remote BuildKit transport failure does not block the host-built artifact path. It also strips builder-routing env such as BUILDX_BUILDER for that packaging subprocess; set DOCKER_WEB_NATIVE_RUNNER_BUILDX=1 only when that packaging step must use the configured buildx builder. Native mode skips support-service image builds by default and reuses the existing support images; set DOCKER_WEB_NATIVE_SUPPORT_BUILD=1 to build support images locally with docker compose build, or DOCKER_WEB_NATIVE_SUPPORT_BUILDX=1 when those support builds should use the configured buildx builder. When a production blue/green web image was just built locally with bun serve:web:docker:bg, Dockerized E2E can reuse that image instead of building apps/web again:

E2E_DOCKER_REUSE_WEB_IMAGE=1 bun test:e2e:web:docker -- e2e/multi-account.noauth.spec.ts --project=chromium-no-auth

The runner reads tmp/docker-web/prod/active-color and retags the matching tuturuuu-web-<color> image into the isolated ttr-e2e-* Compose project for both web-blue and web-green, then skips the web build stage. When the same local blue/green build also left the support images behind, the runner retags hive-<color>, hive-realtime, backend, meet-realtime, markitdown, storage-unzip-proxy, supermemory, web-docker-control, and web-cron-runner into the E2E project and skips those support builds too. If any support image is missing, E2E falls back to the normal support-service build path while still reusing the web image. Use this only when the source image was built from the current checkout. Set E2E_DOCKER_REUSE_WEB_IMAGE_SOURCE=<image> to reuse a specific web image tag, or E2E_DOCKER_REUSE_WEB_IMAGE_COLOR=blue|green|auto when the active-color file is not the lane you want. GitHub Actions builds the same image set once per E2E workflow run and shares it through the private ghcr.io/tutur3u/platform-e2e package. The producer starts after the CI switchboard check, and the Playwright and migration matrices wait for that job to finish before GitHub allocates their runners. The producer publishes immutable tags under <run-id>-<run-attempt>-<commit-sha>-<service> and pushes the -ready tag only after every planned blue/green and TanStack image exists and the package has been verified private. Consumers wait up to E2E_IMAGE_BUNDLE_WAIT_SECONDS (ten seconds by default), pull the complete frontend plan, retag it for their isolated Compose project, and skip web/support image builds. Playwright shards request only Next plus support images, the TanStack dual-stack mode requests only TanStack plus support images, and compare mode requests both frontends. Consumer jobs use always() at the job boundary, so a failed best-effort producer still starts the existing local cache-backed build path. Missing or partial bundles emit a notice and fall back without leaving matrix runners idle for minutes. The workflow exposes only three bundle controls:

E2E_IMAGE_BUNDLE_REPOSITORY is the private two-segment GHCR repository.
E2E_IMAGE_BUNDLE_TAG_PREFIX identifies one workflow run and attempt.
E2E_IMAGE_BUNDLE_WAIT_SECONDS bounds how long a consumer may wait before falling back.

The producer uses deterministic E2E-only PLATFORM_BUILD_* metadata so commit timestamps and messages do not invalidate otherwise reusable build work; production release metadata remains unchanged. Published cache images carry the official org.opencontainers.image.source config label, and run indexes retain the corresponding OCI annotation, so GHCR links the package to this repository and grants its workflow token package administration. The privacy check retries briefly while GitHub’s package metadata catches up. An always() cleanup job deletes the exact run prefix after all shards and migration modes finish, and the next producer removes abandoned bundle versions older than 24 hours to cover canceled workflows. Before publishing run-scoped tags, the producer creates a permanent sentinel image version when one does not already exist. Later runs reuse it instead of producing untagged sentinel versions. GHCR rejects deleting a package’s final tagged version individually, so the sentinel lets exact-prefix cleanup remove every run tag without deleting the package or touching another concurrent workflow. Bootstrap cleanup preserves one final version when no sentinel exists yet; the next successful publish creates the sentinel so a later stale sweep can remove it. Bundle publication is best-effort, but cleanup is a required storage-safety gate. Investigate package visibility, packages token permissions, or GHCR deletion errors when cleanup fails; do not make the package public or replace these images with long-retention Actions artifacts. The producer also maintains one mutable cache-<service> tag per planned image. These tags anchor the latest runnable layers in private GHCR after run-scoped tags are deleted, so the next commit does not have to upload unchanged final image layers again. Superseded cache versions become untagged and remain covered by the 24-hour stale sweep. Buildx writes cache images directly to GHCR instead of exporting them through the runner’s local Docker image store. Each successful build records its immutable digest; registry-side manifest promotion creates a distinct annotated run index from that digest, even if another workflow updates the mutable cache tag concurrently. Exact-prefix cleanup can therefore delete run indexes without deleting persistent cache tags. The ready index remains last, so consumers never observe a partial bundle. These registry-layer anchors complement Turbo and BuildKit caches without adding Actions cache entries or artifact storage. Trusted main producers also expose the Turbo team and token secret files to the native web artifact child process. This lets both Docker/BuildKit stages and the host-side Next build reuse Turborepo remote cache entries. The values remain file-scoped until that child starts, are not passed as Docker build arguments, and are never persisted in an image layer. Pull requests and other untrusted runs continue without remote-cache credentials. Each GitHub-hosted E2E runner starts from an empty Supabase volume. In CI, E2E_DOCKER_SUPABASE_RESET=0 therefore asks the runner to start Supabase once and trust that initial bootstrap, which already applies migrations and seed data. This avoids immediately repeating the same reset while preserving a separate database stack for every shard and migration mode. Local E2E keeps the safer reset-by-default behavior because a developer may already have mutable Supabase volumes. On GitHub-hosted runners, the E2E workflow frees disk before restoring or loading cached Supabase Docker images. Keep that cleanup ahead of the cache load: running docker system prune -af --volumes after cached images are loaded would remove the images the shard is about to use, while skipping the cleanup can leave too little space for the web Docker image dependency layer. The default push workflow keeps the existing cache transport. To benchmark the cache archive against direct registry pulls without changing healthy main runs, dispatch E2E Tests manually with Supabase image transport benchmark mode set to registry. Compare the Restore cached Docker images, Load cached Docker images, and Run Playwright shard step durations with an otherwise equivalent cache dispatch. Do not replace the default until three successful paired runs show a consistent critical-path improvement; migration replay itself is only a small part of Supabase startup time. When an E2E shard fails, the runner prints diagnostics before teardown while the containers still exist. The job log includes the primary error, blue/green stage state, the Playwright .last-run.json file when available, Docker containers for the shard Compose project, production Compose status, recent logs for web, Hive, proxy, and support services, the Portless route list, a probe against the configured E2E BASE_URL, and bun sb:status. The workflow also has a failure-only diagnostic step after Run Playwright shard as a backstop, so the job output should show the failing service or stage even when Playwright report artifacts are incomplete. Before upload, the workflow rewrites the diagnostics directory to redact secret-shaped key/value pairs, bearer tokens, sensitive query parameters, JWTs, and secret-like runner environment values. The workflow uploads diagnostics, Playwright reports, and apps/web/test-results for every non-cancelled shard so traces and screenshots stay available when the job output is too short. The Playwright global setup refuses non-local web origins and refuses Supabase origins outside localhost, 127.0.0.1, or host.docker.internal on port 8001. CI shards E2E with --shard=x/4; each shard gets its own Compose project name, but all shards still use ephemeral local Supabase rather than any cloud Supabase project. The generated E2E env sets DOCKER_WEB_ALLOW_LOCAL_SUPABASE=1 so the production-image rehearsal can use that local Supabase origin without weakening production serving defaults. Because the Docker web app runs with NODE_ENV=production, the generated env file and Playwright process env also pin WEB_APP_URL, NEXT_PUBLIC_WEB_APP_URL, and NEXT_PUBLIC_APP_URL to the local shared-cookie origin; otherwise central-auth redirects can escape to the real tuturuuu.com origin during setup. The auth bypass is guarded by the local E2E web origin, the incoming request Host / forwarded host / Origin headers, and both the public and server-side Supabase origins before server-side auth code honors it, so it must not be used as a general production configuration. The blue/green proxy and apps/web runtime both allow 64 KB request headers. That headroom lets the browser reach /~recover-browser-state or the normal login flow when duplicated Supabase cookies make the default header limit too small. If a request is still too large for the proxy to forward, nginx handles 431/494 directly with Clear-Site-Data and redirects to /login?browserStateReset=1; this recovery must stay in the proxy because Next.js middleware cannot run after nginx rejects the header. The blue/green nginx proxy must forward the original Host header with its port intact via $http_host. Local E2E auth setup posts to http://localhost:7803/api/auth/dev-session, and the production-mode app accepts the setup route only when the public request origin stays local. The guard also tolerates production standalone/proxy normalization where request.url or Host becomes an internal Docker web upstream, but only when the forwarded public host is still the local E2E origin.

Coolify

Coolify can provide enough default deployment metadata for Tuturuuu’s Dockerfile setup to derive the app origin even when you do not manually define the usual app URL variables.

During Dockerfile builds, scripts/build-web-docker.js now derives missing WEB_APP_URL, NEXT_PUBLIC_WEB_APP_URL, and NEXT_PUBLIC_APP_URL values from Coolify’s COOLIFY_URL or COOLIFY_FQDN defaults before running bun run build:web.
During production container startup, apps/web/docker/prod-entrypoint.js applies the same Coolify fallback so server-side runtime code sees the same derived values.
The runtime URL resolvers used by the web proxy, internal API client, and drive export/auto-extract flows also fall back to COOLIFY_URL and COOLIFY_FQDN.

Recommended setup in Coolify:

Still set explicit Tuturuuu env like NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY, SUPABASE_SECRET_KEY, and any email or storage secrets yourself.
You can omit WEB_APP_URL, NEXT_PUBLIC_WEB_APP_URL, and NEXT_PUBLIC_APP_URL if Coolify already injects COOLIFY_URL or COOLIFY_FQDN for the deployment.
If you need one specific canonical domain while Coolify exposes multiple domains, set the Tuturuuu app URL variables explicitly instead of relying on the automatic fallback.

Development Mode

Development mode exists to preserve the normal root script contract while moving the web runtime into containers.

Container-managed node_modules are isolated from the host.
Package-local node_modules and dist directories are also isolated so host installs do not shadow container artifacts.
The root Docker context excludes generated app artifacts such as .next, .turbo, coverage output, and Flutter build directories. Keep these excludes intact so production builds do not stream multi-gigabyte local artifacts into BuildKit.
A host bun install is not required just to boot the Dockerized web stack.

Production Mode

The production compose file uses the runner target from apps/web/Dockerfile.

In-Place

bun serve:web:docker

Use this when a short restart is acceptable.

Blue/Green

bun serve:web:docker:bg

Blue/green deploy does this:

Reads the last active color from tmp/docker-web/prod/active-color.
Ignores that state if the corresponding container no longer exists.
Builds the target web image through Docker Buildx Bake using Compose-derived targets, then stops/removes only the old target web lane and starts the fresh replacement.
Starts the target web lane after its healthcheck passes and records web-promote as a staged target in tmp/docker-web/prod/target-state.json. The deploy does not reload web-proxy or write tmp/docker-web/prod/active-color yet.
Builds and runs Hive separately. hive-db-migrate, the target hive-blue/hive-green service, hive-realtime, and the Hive proxy check must pass before web can be publicly promoted. A migration or Hive health failure marks the Hive stage failed, leaves active-color on the previous web lane, and keeps the staged target web lane out of public routing.
Refreshes support services (backend, meet-realtime, markitdown, storage-unzip-proxy, web-docker-control, and web-cron-runner) after web/Hive target work. A support build or health failure also blocks web-proxy reload and leaves the previous active web lane serving. Their build step is scoped: ordinary web commits build only web-blue or web-green, while Hive and helper images rebuild only when their source, Dockerfile, compose wiring, or shared dependency inputs changed. Image-only services such as redis, serverless-redis-http, web-proxy, and cloudflared are never passed to Bake.
Injects Docker-internal helper URLs into apps/web: BACKEND_INTERNAL_URL=http://backend:7820, MARKITDOWN_ENDPOINT_URL=http://markitdown:8000/markitdown, DISCORD_APP_DEPLOYMENT_URL=http://markitdown:8000, DRIVE_AUTO_EXTRACT_PROXY_URL=http://storage-unzip-proxy:8788/extract, and INTERNAL_WEB_API_ORIGIN=http://web-proxy:7803.
Keeps the stable web-proxy container running in place during ordinary promotions instead of re-running compose up against the public :7803 listener. If the running proxy is missing required host ports or its container image no longer matches the resolved Compose image, the deploy defers the forced proxy recreate until after the target web, Hive, and support gates have passed.
Validates the generated nginx config with nginx -t, then reloads or recreates the proxy only after every staging gate has passed.
Immediately verifies the proxy can serve the internal /__platform/drain-status endpoint through the newly routed color before writing active-color and marking the staged web target healthy. This avoids false deployment failures from public API middleware or rate limits.
Polls an internal drain-status endpoint on the old color and waits until it has no in-flight HTTP work left before demoting it to standby. This keeps long-running server actions, route handlers, and other open requests from being cut off mid-flight.
Falls back to the short fixed drain window only when the old image predates the drain-status endpoint and cannot report its active requests yet.
Keeps the demoted color online as a warm nginx backup target instead of removing it immediately, so stale keepalive workers and Cloudflare Tunnel connections can still fail over cleanly during the post-promotion window.
If the demoted standby color is still on the previous revision after 15 minutes, the watcher automatically rebuilds that stale standby in place so both colors converge on the latest checked-out code without flipping the active port or promoting traffic again.

During blue/green deploys, the watcher supplies the version badge metadata via PLATFORM_BUILD_* variables for both Docker image builds and runtime containers. It infers PLATFORM_BUILD_COMMIT_HASH, PLATFORM_BUILD_COMMIT_SHORT_HASH, PLATFORM_BUILD_COMMIT_MESSAGE, PLATFORM_BUILD_REF_NAME, PLATFORM_BUILD_ENVIRONMENT, PLATFORM_BUILD_BUILT_AT, PLATFORM_BUILD_DEPLOYMENT_URL, and PLATFORM_BUILD_DEPLOYMENT_STAMP from the current checkout plus the deployment context. PLATFORM_BUILD_BUILT_AT is the checked-out commit’s source timestamp, so rebuilding the same commit and environment produces identical metadata; it is not the image-build or rollout time. Deployment stamps and the watcher’s startedAt, finishedAt, and activatedAt records remain the source of actual deployment timing. The build helper strips deployment stamps and deployment URLs from Compose, Bake, and native build environments, then supplies them only to the runtime container; different rollouts of the same source therefore reuse the same Turbo and image layers. The account-gated badge reads those runtime values before falling back to generated Vercel/GitHub metadata, so on-prem watcher deployments show the served commit instead of local / Unknown. If those PLATFORM_BUILD_* values are missing or blank in a self-hosted runtime, apps/web falls back to the mounted blue/green snapshot before using generated/local defaults. The resolver reads only lightweight snapshot files: prod/target-state.json, prod/active-color, prod/deployment-stamp, watch/blue-green-auto-deploy.status.json, and watch/blue-green-auto-deploy.history.json under PLATFORM_BLUE_GREEN_MONITORING_DIR, with local tmp/docker-web candidates for development. Selection prefers targets.web for the active color, then an active deployment row, then the latest successful row for the active color, then the latest successful row overall. commitSubject becomes the badge commit message, committedAt (or an explicit source timestamp) becomes builtAt, and deployment timestamps are used only to order deployment candidates. Legacy rows without a source timestamp never relabel rollout time as source time. The runtime deployment stamp file supplies the displayed deployment stamp. The resolver does not invent deployment URL, ref, or environment from color or commit data alone. The helper writes support-image input hashes to tmp/docker-web/prod/build-input-hashes.json and keeps recent decisions in tmp/docker-web/prod/build-input-hashes.history.json. Infrastructure monitoring reads that history so deployment rows can show which helper images were rebuilt and which ones were served from the cached build inputs.

Meet Realtime

apps/meet-realtime is the internal control-plane service for Meet calls, webinars, and low-latency broadcast coordination. It is a Bun WebSocket service started by production Compose as meet-realtime on container port 7816. web-proxy exposes /realtime for tumeet.me and meet.tuturuuu.com and forwards WebSocket upgrades to that service. Production meeting logic stays on Tuturuuu infrastructure:

apps/web owns protected meeting APIs, verifies workspace access, and mints short-lived MEET_REALTIME_TOKEN_SECRET join tokens.
apps/web and apps/meet-realtime must share the same MEET_REALTIME_TOKEN_SECRET. Production Compose exposes MEET_REALTIME_URL and NEXT_PUBLIC_MEET_REALTIME_URL to Web, defaulting both to wss://meet.tuturuuu.com/realtime for browser join-token payloads.
Browsers connect to wss://meet.tuturuuu.com/realtime?token=....
apps/meet-realtime validates the token, manages ephemeral room presence, chat, stage state, and reconnect resync, then calls Cloudflare Realtime SFU APIs with server-only CLOUDFLARE_REALTIME_APP_ID and CLOUDFLARE_REALTIME_APP_SECRET.
Browser media flows to Cloudflare Realtime SFU. The control WebSocket can reconnect during watcher-managed service refreshes without creating a new meeting record.
Broadcast streaming stays API-owned by apps/web: the meeting host calls /api/v1/workspaces/:wsId/meetings/:meetingId/stream, apps/web creates or resumes a Cloudflare Stream live input with server-only CLOUDFLARE_ACCOUNT_ID plus CLOUDFLARE_STREAM_API_TOKEN (or CLOUDFLARE_API_TOKEN), stores the live input UID and WHIP/WHEP URLs in private.meet_stream_live_inputs, and returns the WHIP publish URL only to the host response. Workspace viewers receive only the WHEP playback URL.

Cost controls are part of the signed token contract: camera defaults off, video is capped at 720p/24fps, room limits are explicit, and webinar viewers do not receive publish scope. Cloudflare Stream live inputs are created with recording mode off and hidden viewer counts by default; set CLOUDFLARE_STREAM_ALLOWED_ORIGINS to a comma-separated allowlist when Stream playback should be origin-restricted. Do not add Cloudflare Workers or Durable Objects for production Meet room logic; use the internal service and blue/green watcher instead. scripts/docker-web/env.js persists generated helper tokens under tmp/docker-web/markitdown-token, tmp/docker-web/storage-unzip-token, tmp/docker-web/supermemory-api-key, and tmp/docker-web/supermemory-postgres-password. Override them with DOCKER_MARKITDOWN_ENDPOINT_SECRET, DOCKER_DRIVE_UNZIP_PROXY_SHARED_TOKEN, or the DOCKER_SUPERMEMORY_* env when an operator needs fixed values. Workspace ZIP auto-extract is enabled by the workspace-level DRIVE_AUTO_EXTRACT_ZIP secret. Workspaces with EXTERNAL_PROJECT_ENABLED=true also opt in automatically so CMS/WebGL workspaces can reuse the unzipper without duplicating storage automation setup. The Docker-internal URL and token are fallbacks for workspaces that have not supplied custom proxy secrets. If a workspace supplies a custom DRIVE_AUTO_EXTRACT_PROXY_URL, it must also supply its own DRIVE_AUTO_EXTRACT_PROXY_TOKEN; the process-wide fallback token must not be sent to a workspace-controlled proxy URL. CMS WebGL package uploads also use the storage-unzip-proxy, but they are a first-class CMS upload path rather than generic Drive automation. They require a configured unzip proxy URL and token, but they do not require the DRIVE_AUTO_EXTRACT_ZIP workspace opt-in secret. The CMS finalize route unpacks the ZIP into workspace Drive, detects the playable index.html, and stores the same-origin artifact map on the CMS webgl-package asset. Browser uploads go directly to the signed storage URL returned by the self-hosted web app’s WebGL upload-url route, so large ZIPs do not pass through the Vercel-hosted CMS app or the web app proxy before reaching Supabase Storage or R2. The CMS client reports per-file upload progress during the signed upload, then calls the WebGL finalize route so the backend handles extraction and artifact-map persistence. The unzip proxy fans out backend callbacks for extracted folders and asks the callback route for per-file upload URLs. Before uploading extracted bytes, the proxy verifies the callback response names a trusted provider and that the signed upload URL belongs to hosted Supabase, Cloudflare R2, or an exact operator-configured upload origin. It forwards only content type and generated bearer-token headers to the upload URL. The storage auto-extract and CMS WebGL extract callback routes still pass through the central API proxy guard before they validate the shared unzip token, so malformed, rate-limited, or oversized callback requests are rejected at the same cheap boundary as other API mutations. Direct file callbacks are legacy/small-file only and enforce the same 512 KiB body budget locally; large extracted files must use the file-upload-url callback flow. The proxy currently buffers the downloaded archive and each extracted file in memory, so the default caps stay conservative: 100 MiB ZIP downloads, 50 MiB per extracted file, and 250 MiB total extracted output. Operators can tune those caps with DRIVE_UNZIP_PROXY_MAX_ARCHIVE_BYTES, DRIVE_UNZIP_PROXY_MAX_ENTRY_BYTES, DRIVE_UNZIP_PROXY_MAX_TOTAL_EXTRACTED_BYTES, and DRIVE_UNZIP_PROXY_MAX_ARCHIVE_ENTRIES; workspace Drive quota must still be large enough for the uploaded archive and extracted files. Set DRIVE_UNZIP_PROXY_ALLOWED_UPLOAD_ORIGINS for self-hosted Supabase or custom R2/S3-compatible origins, and reserve DRIVE_UNZIP_PROXY_ALLOW_LOCAL_UPLOAD_ORIGINS=true for local Supabase testing. The MarkItDown endpoint is the conversion path for uploaded workspace files. Do not route YouTube summaries through MarkItDown or Google Search. Google Gemini chat requests attach one public or unlisted YouTube URL directly as a native video/mp4 file input, so the model can summarize the video through the provider-supported video path. Playlist/query parameters are stripped before the URL is attached so each request references only one video. Any legacy direct URL conversion path that still reaches MarkItDown must reserve and commit the fixed MarkItDown credit charge before the sidecar request is sent. Interrupted Docker Compose recreates can leave temporary container names such as <hex>_platform-markitdown-1. The Docker helper treats those as recoverable only when the suffix matches one of the services in the current compose up request, removes that stale temp container, and retries the same narrow up operation. Compose can also briefly report dependency failed to start with No such container: <id> when a dependency was recreated between Docker’s dependency resolution and health wait. The helper treats that as a stale dependency reference and retries the same narrow compose up without deleting unrelated containers. Tune that retry budget with DOCKER_WEB_COMPOSE_UP_STALE_DEPENDENCY_RETRY_MAX_ATTEMPTS. The production web-proxy service is pinned to the official mainline Alpine image nginx:1.31.0-alpine, and scripts/check-docker-web.js verifies that pin in the merged production Compose config. The long-lived nginx proxy also raises its request-header buffer limits so larger session/auth cookies do not fail at the proxy layer with 400 Request Header Or Cookie Too Large before the active web container sees the request. It now also raises its upstream response-header buffers (proxy_buffer_size, proxy_buffers, and proxy_busy_buffers_size) so larger Supabase auth responses with multiple Set-Cookie headers do not fail with upstream sent too big header while reading response header from upstream. The proxy uses Docker DNS re-resolution plus a shorter keepalive timeout so promotions are less likely to produce transient 502 Host Error responses for existing Cloudflare Tunnel connections, while the previous color remains alive as a warm standby. The proxy keeps both blue and green in the nginx upstream group during steady state, with the active color as the primary upstream and the standby color as a backup. The runtime DNS resolver is defined at the nginx include/http scope, not just inside server, so Docker service-name resolution continues to work for the blue/green upstream block at reload time. Both the production web image healthcheck and the web-proxy compose healthcheck now use the internal /__platform/drain-status endpoint too, so raw bun serve:web:docker:bg waits on the same non-rate-limited readiness path as the blue/green promotion gate. The proxy exposes that path as an exact loopback-only nginx location and forwards a private internal probe header to the active web lane, because the web request tracker intentionally answers the drain-status endpoint only for local or explicitly trusted Docker-network requests. Every blue/green deployment also stamps the runtime with PLATFORM_DEPLOYMENT_STAMP and PLATFORM_BLUE_GREEN_COLOR. Those values are surfaced through both nginx response headers and the web process itself, and the web layout appends the deployment stamp to the service-worker URL with updateViaCache: 'none' so new deployments push browsers toward the latest worker instead of lingering on stale cached state. The local runtime state lives in:

tmp/docker-web/prod/active-color
tmp/docker-web/prod/deployment-stamp
tmp/docker-web/prod/nginx.conf
tmp/docker-web/prod/target-state.json

These files are intentionally local-only and safe to regenerate. Infrastructure Monitoring → Deployments reads target-state.json, the watcher deployment history, and the latest deployment stage handoff together, so operators can see staged target work such as a prepared web color while Hive or support gates still block public promotion. active-color and the generated proxy config remain on the previous serving web lane until the final proxy-reload stage passes. Watcher-managed deployments persist the web-build, web-promote, hive-migrate, hive-promote, support-refresh, and proxy-reload stage results into deployment history. Modern rows that were recorded without a stage array are inferred from final deployment status and build-cache metadata; truly pre-tracking rows still show stage chips as not applicable. Active watcher deployments that only have pending build/deploy status are surfaced with a synthetic current stage so operators can see the build is in progress before full stage history is written. When TUTURUUU_CI_CHECKS_ENABLED=1, the watcher also publishes one sanitized GitHub Check Run per watched commit. The default check name is Tuturuuu CI; override it with TUTURUUU_CI_CHECK_NAME. The managed production path is the Infrastructure → GitHub Bot page in the root workspace: create and install a Tuturuuu-owned GitHub App with repository Checks: write permission, then save the App ID, installation ID, repository owner/name, and private key there. The private key stays server-side and encrypted in the private-schema vault. After the configuration validates, use Enable watcher auto-pickup on the same page. apps/web issues a dedicated watcher-only client token, writes a small credential request into the blue/green control directory, and the watcher moves that credential into its local runtime directory on the next Check Run publish. The watcher then discovers the apps/web installation-token endpoint without requiring you to copy GitHub-related env into the watcher process. The queued runtime credential is the Check Run opt-in signal for the watcher. Manual generated-token setup remains available for local or emergency use:

TUTURUUU_CI_CHECKS_ENABLED=1
TUTURUUU_CI_GITHUB_TOKEN_URL=https://<apps-web-origin>/api/v1/infrastructure/github-bot/installation-token
TUTURUUU_CI_GITHUB_TOKEN_CLIENT_TOKEN=<watcher-client-token>

The watcher exchanges that client token for repository-scoped GitHub App installation tokens and refreshes them before expiry. Revoke and reissue the watcher client from Infrastructure → GitHub Bot when rotating access. The browser UI never displays generated GitHub installation tokens, and the auto-pickup action does not display the watcher client token. Manual and static-token paths still require TUTURUUU_CI_CHECKS_ENABLED=1; when enabled, the publisher uses TUTURUUU_CI_GITHUB_TOKEN first, then an explicitly configured generated token endpoint, then the watcher auto-pickup runtime credential, then GITHUB_TOKEN. Static tokens must be able to create and update Check Runs for the repository. The watcher stores the latest check-run id per commit in tmp/docker-web/watch/blue-green-github-checks.json so restarts update the same GitHub row instead of creating duplicates. TUTURUUU_CI_CHECK_DETAILS_URL is optional and omitted unless explicitly configured. The GitHub-facing Check Run is intentionally not a log export. It includes only allowlisted rollout metadata: commit SHA/short SHA, branch/upstream, deployment kind, watcher status, current stage, aggregate stage counts, and safe timestamps/durations. It must not include raw watcher logs, raw error messages, environment values, local host paths, hostnames, emails, user ids, tokens, or secret-shaped key/value text. The watcher also uses GitHub workflow-run validation to avoid repeatedly building a commit whose CI has already failed. When Check Run publishing is enabled, GITHUB_REPOSITORY is present, or DOCKER_WEB_WATCHER_GITHUB_VALIDATION=1 is set, the watcher reads the latest Actions workflow runs for the candidate commit’s exact head_sha. If the latest run for any workflow completed with failure, cancelled, timed_out, startup_failure, or action_required, automatic deploy, recovery handoff, reconciliation, and standby-refresh builds are suppressed with watcher status validation-blocked. That state does not add a failed deployment row or consume the retry budget; fix CI and let the watcher see a new successful/latest run, or set DOCKER_WEB_WATCHER_GITHUB_VALIDATION_DISABLED=1 for an operator-approved manual override. After each Hive migration pass, the deploy helper runs docker compose rm --stop -f hive-db-migrate so the completed one-shot migration service is stopped if necessary and removed. This keeps hive-db-migrate from lingering after depends_on starts it while Hive services come up.

Native Cron Runner

Self-hosted production cron jobs use apps/web/cron.config.json as the shared source of truth. apps/web/vercel.json.crons should stay generated from that file with:

node scripts/sync-web-crons.js --check
node scripts/sync-web-crons.js

Use --check in CI and local verification when cron definitions change. The sync script preserves Vercel behavior by copying each enabled job’s path and schedule from the shared config into apps/web/vercel.json. In Docker production, the web-cron-runner service runs scripts/watch-web-crons.js against INTERNAL_WEB_API_ORIGIN, defaulting to http://web-proxy:7803. Requests include Authorization: Bearer ${CRON_SECRET || VERCEL_CRON_SECRET} so the same route auth gate can protect Vercel and native Docker executions. The cron-runner image bundles the cron-parser dependency used by that script, so runtime execution does not depend on node_modules existing in the mounted host checkout. When neither CRON_SECRET nor VERCEL_CRON_SECRET is set on the host, the Docker environment generator creates a persisted internal secret at tmp/docker-web/cron-token and injects it into both the web containers and the web-cron-runner service as CRON_SECRET. This keeps native Docker cron auth self-contained while preserving explicit host-provided secrets when present. Watcher startup also keeps the Docker control sidecar and cron runner present. bun serve:web:docker:bg, bun serve:web:docker:bg:watch, and watcher recovery recreate or resume web-blue-green-watcher first, refresh web-docker-control, then ensure web-cron-runner exists with a no-recreate Compose start. A healthy existing cron runner is left running; a missing control sidecar or runner fails watcher bootstrap instead of leaving the stack without native cron recovery or execution. web-cron-runner is the executor. The blue/green watcher is one recovery consumer, not the owner of cron execution. Each watcher poll reconciles cron-runner health before normal deploy work by checking the web-cron-runner container state, Docker healthcheck result, and tmp/docker-web/cron/status.json.updatedAt. If the container is missing, unhealthy, or the heartbeat is stale, the watcher writes the existing tmp/docker-web/watch/control/cron-runner-recovery.request.json request and processes it immediately with a force-recreate of web-cron-runner. It then waits for a fresh heartbeat before reporting cron-runner-recovered; if the Compose restart fails, the request remains on disk with lastError and the watcher backs off before retrying. web-docker-control also owns an independent cron-runner watchdog. The watchdog defaults on, reads the same cron runner heartbeat, inspects the web-cron-runner container, ensures web-blue-green-watcher, and force-recreates only web-cron-runner when the heartbeat is stale/missing or the container is missing/unhealthy. This keeps cron recovery available even when the blue/green watcher heartbeat is stale. The watchdog does not recover the whole serving stack; route-origin failures such as an unreachable INTERNAL_WEB_API_ORIGIN / web-proxy are reported in monitoring diagnostics for an operator to fix separately. Cron runner heartbeats refresh both runner liveness and schedule metadata. When the runner starts a cycle or keeps a long execution alive, it recomputes each enabled job’s next future run from apps/web/cron.config.json, current runtime control overrides, and UTC time. The monitoring API also derives future nextRunAt values from config/control if an older persisted status file still contains stale schedule fields, while preserving the separate stale/live runner health signal from status.json.updatedAt. web-cron-runner also protects itself. The entrypoint restarts the child scripts/watch-web-crons.js process when the status heartbeat is missing, invalid, or stale after startup grace, and the Docker healthcheck calls the same entrypoint heartbeat check. This prevents a hung cron child from looking healthy just because the process still exists. Useful cron-runner recovery knobs:

DOCKER_WEB_WATCHER_CRON_RUNNER_STALE_AFTER_MS controls watcher heartbeat staleness detection. Default: 120000.
DOCKER_WEB_WATCHER_CRON_RUNNER_RECOVERY_WAIT_MS controls how long the watcher waits for a fresh heartbeat after restart. Default: 90000.
DOCKER_WEB_WATCHER_CRON_RUNNER_RECOVERY_POLL_MS controls the heartbeat wait poll interval. Default: 2000.
PLATFORM_CRON_RUNNER_STATUS_STALE_AFTER_MS controls the entrypoint child watchdog. Default: 120000.
PLATFORM_DOCKER_CONTROL_CRON_WATCHDOG_DISABLED disables the web-docker-control cron-runner watchdog when set to true. Default: false.
PLATFORM_DOCKER_CONTROL_CRON_WATCHDOG_INTERVAL_MS controls the web-docker-control watchdog polling interval. Default: 30000.
PLATFORM_DOCKER_CONTROL_CRON_RUNNER_STALE_AFTER_MS controls web-docker-control heartbeat staleness detection. Default: 120000.
PLATFORM_DOCKER_CONTROL_CRON_RECOVERY_COOLDOWN_MS controls web-docker-control recovery cooldown after a restart attempt. Default: 60000.
PLATFORM_CRON_DOCKER_TELEMETRY_TIMEOUT_MS bounds Docker ps / logs probes used for cron telemetry. Default: 10000.

web-docker-control is an internal-only sidecar with the Docker CLI, Docker socket, and host worktree mount. apps/web reaches it through PLATFORM_DOCKER_CONTROL_URL and PLATFORM_DOCKER_CONTROL_TOKEN for hardcoded watcher/cron-runner recovery actions only; it does not expose a general Docker or shell interface. Its status file at tmp/docker-web/docker-control/status.json includes both the latest manual or watchdog recovery attempt and the latest watchdog check, which the cron monitoring dashboard surfaces alongside watcher and runner heartbeat warnings. When the runner heartbeat is stale, future cron times in the dashboard are scheduled estimates derived from config/control, not proof that the runner will execute them. Runtime cron telemetry is intentionally file-based and local to the host:

tmp/docker-web/cron/status.json for runner health, the current cycle, and the latest manual run lifecycle records. Manual runs move through queued, processing, and a final success / failed / timeout / skipped state. While any run is processing, the runner refreshes this heartbeat and, for manual runs, captured route console logs so the monitoring UI can show near-realtime status and log updates. Heartbeat refreshes also recompute future per-job and aggregate nextRunAt values so a recovered runner does not keep serving stale schedule badges.
tmp/docker-web/cron/state.json for restart-safe last-run markers.
tmp/docker-web/cron/executions/*.jsonl for per-run route response, duration, status, and captured web-container console logs.
tmp/docker-web/watch/control/cron-control.json for the global enabled switch.
tmp/docker-web/watch/control/cron-run-requests/*.json for queued manual runs created by the monitoring UI.

Disabling cron execution blocks scheduled jobs and leaves manual run requests queued until cron execution is enabled again. The runner also supports --once, which is used by script tests to verify due-run detection, queued manual runs, restart-safe state, and log persistence without starting the long-running loop. Calendar cron routes should call workspace calendar APIs through INTERNAL_WEB_API_ORIGIN when it is present. In Docker production this keeps provider sync and smart scheduling traffic on the internal web-proxy origin instead of accidentally depending on a public app URL from inside the container. calendar-provider-sync intentionally calls the same /api/v1/workspaces/:wsId/calendar/sync route that the calendar page uses, but with cron auth and source: "cron" so dashboard runs stay manual and scheduled runs stay auditable. The workspace sync route is responsible for provider fan-out: Google connections are selected from active Google auth-token rows, and Microsoft connections are selected from active Microsoft auth-token rows. Do not reimplement provider-specific calendar fetching inside the cron wrapper. The job should run every 15 minutes from apps/web/cron.config.json; Calendar provider sync should not be scheduled through Trigger.dev in production.

Auto-Deploy Watcher

bun serve:web:docker:bg:watch locks the current branch/upstream at startup, polls every second, fast-forwards when GitHub has a newer commit, and runs the blue/green deploy flow automatically. When the watcher container starts on a host with no active blue/green runtime, it treats that idle state as a missing active deployment and bootstraps the current commit. This first-run recovery creates web-proxy, the active and standby lanes, and cloudflared when the tunnel profile is enabled. Run this command from a host-level process manager, not only from inside Docker. The command starts and tails the web-blue-green-watcher container, but the host process is the part that can recover after the Docker engine itself dies. When Docker is unavailable or Docker CLI probes stop returning, the host supervisor polls Docker with bounded probes; after DOCKER_WEB_WATCHER_DOCKER_RESTART_AFTER_MS milliseconds of continuous failure (default 30000), it attempts to restart Docker, waits for Docker probes to pass, runs any configured host-level post-restart commands, then recreates the watcher container. The recreated watcher reuses the existing cached blue/green recovery path to bring web-proxy and the active/standby web lanes back to health. During normal steady-state polling, after active build-lock and revert checks, the watcher also reconciles the production Compose services that should already be serving. It inspects the expected services with stopped containers included: web-proxy, the active web or TanStack lane, Redis profile services, enabled health-gated sidecars, active-color Hive/realtime services, and cloudflared when its profile is enabled. Missing, exited, dead, or unhealthy services are recovered with bounded no-build Compose operations: stopped or missing services use docker compose ... up --detach --no-build --remove-orphans, unhealthy services are force-recreated, and starting services are left alone until the next poll. Image rebuilds remain owned by the normal blue/green deploy flow and the cached/full active-runtime recovery fallback. Before recreating the watcher after a suspected devbox/local Supabase mix-up, inspect only origin classifications, not secrets. Root .env.local should classify as cloud for NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SERVER_URL; apps/web/.env.local may classify as local after ttr box setup, but the watcher must not select it while root .env.local exists. If local Supabase was started accidentally on a production host and no other local workflow is using it, stop it with bun sb:stop, then recreate the watcher with bun serve:web:docker:bg:watch. Docker restart command defaults:

Linux: systemctl restart docker
macOS: open -ga Docker
Windows: powershell.exe -NoProfile -Command Start-Process "Docker Desktop"

Override the command with a JSON array when the host needs a different service manager or a narrow sudo rule:

DOCKER_WEB_WATCHER_DOCKER_RESTART_COMMAND='["sudo","systemctl","restart","docker"]' \
  bun serve:web:docker:bg:watch -- --if-locked replace

Useful host-supervisor knobs:

DOCKER_WEB_WATCHER_DOCKER_RESTART_AFTER_MS: delay before the first Docker restart attempt while docker info is failing; set 0 to disable attempts.
DOCKER_WEB_WATCHER_DOCKER_RESTART_COOLDOWN_MS: minimum time between restart attempts; default 300000.
DOCKER_WEB_WATCHER_DOCKER_RESTART_COMMAND: command used to restart or open Docker. Prefer JSON array syntax for commands with quoted arguments.
DOCKER_WEB_WATCHER_DOCKER_RESTART_DISABLED=1: hard-disable daemon restart attempts while still waiting for Docker to recover externally.
DOCKER_WEB_WATCHER_DOCKER_RECOVERY_TIMEOUT_MS: optional maximum wait time for Docker recovery; unset or 0 means wait indefinitely.
DOCKER_WEB_WATCHER_DOCKER_PROBE_TIMEOUT_MS: timeout for quick Docker CLI probes such as docker info, docker compose version, and watcher container state checks; default 10000.
DOCKER_WEB_WATCHER_LOG_STREAM_RECONNECT_MS: maximum time to let the host wrapper follow watcher logs before reconnecting and checking watcher health; default 60000.
DOCKER_WEB_WATCHER_DOCKER_POST_RESTART_COMMAND_TIMEOUT_MS: timeout for each additional host-level recovery command; default 600000.
DOCKER_WEB_WATCHER_DOCKER_POST_RESTART_COMMANDS: JSON array of host-level commands to run after Docker is reachable again and before Tuturuuu recreates its watcher container. Each entry is an object with command, args, and an optional cwd.
DOCKER_WEB_WATCHER_MAX_REQUEST_LOG_BYTES: maximum durable proxy request-log ledger size before the watcher rotates and prunes older JSONL chunks before appending new entries; default 268435456 bytes.

Timing, disable, and email alert values can be updated from Infrastructure Monitoring in the web dashboard. The dashboard writes tmp/docker-web/watch/control/blue-green-docker-recovery-settings.json, and the host supervisor reads that file before each Docker recovery wait. Dashboard settings override the environment defaults without restarting the supervisor. Host-level executable commands are different: configure DOCKER_WEB_WATCHER_DOCKER_RESTART_COMMAND and DOCKER_WEB_WATCHER_DOCKER_POST_RESTART_COMMANDS only in the host supervisor environment. The supervisor intentionally ignores command fields from blue-green-docker-recovery-settings.json so dashboard viewers cannot persist host commands for a later recovery event. That settings file also owns Docker crash email alerts:

emailAlertsEnabled: enables SES-backed Docker recovery alert emails from the web cron worker and watcher-side first-failure build/deploy incident emails.
emailAlertRecipients: explicit recipient list. If this is empty, the cron and watcher fall back to PLATFORM_DOCKER_RECOVERY_ALERT_EMAILS, then the last operator email that saved the settings.
emailAlertCooldownMs: minimum time between alert emails; default 1800000.

The host supervisor persists Docker crash/recovery events to the watcher log archive as soon as it detects failed or timed-out Docker probes. If Docker had to be restarted before services recovered, the watcher sends an immediate force-restart recovery email to the configured recovery recipients and records that incident as notified. If the watcher-side email is disabled or fails, the infra app cron job /api/cron/infrastructure/docker-recovery-alerts can still send a fallback SES email after Docker and the infra app are reachable again. Both paths deduplicate by Docker recovery incident id using tmp/docker-web/watch/control/blue-green-docker-recovery-alert-state.json. The blue/green watcher sends its own build/deploy incident email when an apps/web deployment attempt first fails for a commit. This runs from the watcher process rather than the current web image, uses the same recipient resolution as Docker recovery alerts, and only sends for the first failed history row per commitHash. Later retries for that same commit still append full failure history but do not spam operators. The incident email includes the full and short commit hash, commit subject, branch/upstream, deployment kind, host, timing, exit code or signal when available, the recorded failureReason, and debugging pointers for watcher history/log files plus commands like git show --stat --oneline <hash> and watcher container logs. Notification send failures are logged and never block the watcher loop. Watcher incident email code runs from the repo root inside the watcher container, so any workspace package imported by scripts/watch-blue-green/* must be root-resolvable through package.json and covered by a root-runtime import test. Example post-restart commands for colocated projects:

[
  {
    "command": "docker",
    "args": ["compose", "-f", "/srv/zeus/docker-compose.yml", "up", "-d"],
    "cwd": "/srv/zeus"
  },
  {
    "command": "docker",
    "args": ["compose", "-f", "/srv/upskii/docker-compose.yml", "up", "-d"],
    "cwd": "/srv/upskii"
  }
]

For Linux production hosts, install the command as a root-owned systemd service or run it as an operator account with permission to execute only the configured Docker restart command and the explicit post-restart commands needed by colocated projects. Use Restart=always so the host supervisor itself comes back after reboots or process crashes. Example unit:

[Unit]
Description=Tuturuuu blue/green watcher
After=docker.service network-online.target
Wants=docker.service network-online.target

[Service]
Type=simple
WorkingDirectory=/srv/tuturuuu
Environment=NODE_ENV=production
ExecStart=/usr/local/bin/bun serve:web:docker:bg:watch -- --if-locked resume
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Install it with the production checkout path and Bun path for the host, then run systemctl enable --now tuturuuu-blue-green-watcher.service. Keep deployment secrets in the checkout’s root .env.local or an explicit env file, not in the unit file. Additional behavior:

If the watcher script itself changed in the pulled revision, the current watcher process restarts first and the replacement process performs the deploy.
If blue/green is already live and the standby color remains on an older revision for 15 minutes, the watcher rebuilds only the standby color in place. The active color remains primary for new traffic the whole time.
If the watcher sees a degraded blue/green runtime with a proxy or runtime marker present but no active web color serving traffic, it immediately retags the latest retained successful image into the active web color and starts it with --no-build. It prefers a retained image for the current main commit, then falls back to the newest retained successful image so the runtime can recover first and reconcile to main afterward. It then retags the same cached image into the opposite color and starts that as the warm standby, creating two ready copies without waiting for a fresh build.
Blue/green active and standby discovery uses Docker health, not just container presence. If the persisted active color is unhealthy but the opposite color is healthy, the watcher rewrites the active marker and proxy to the healthy color before building or refreshing another lane.
Cached recoveries write a fresh nginx proxy config before the proxy is started, so recovery never boots nginx with a stale upstream that points at a missing or unhealthy color.
That standby catch-up path also stops and removes the stale standby container before rebuilding it, so health checks target the fresh replacement container rather than an outdated standby instance.
Standby catch-up rebuilds reuse the current deployment stamp so the warm backup matches the latest deployment state instead of serving an older build if nginx needs to fail over.
The watcher dashboard surfaces the top 3 most relevant deployments from the recent history, prioritizing in-progress rollouts first, then the live promoted color, then the warm standby. Direct manual bun serve:web:docker:bg runs are written into that same history too.
Cached recoveries write both the active recovery and the standby refresh into the same retained ledger, preserving the current two warm copies plus the prior successful deployment as the fastest rollback reference. If no retained image exists, the watcher falls back to the normal recovery build path.
The infrastructure monitoring rollback controls show the latest retained cached recovery images separately from the general deployment history, so an operator can quickly select a known cache-backed commit before pinning it for rollback or smoke testing.
Successful active and standby builds tag the service image as {compose-project}-web-cache:{commit} and prune older retained cache tags beyond the three newest successful deployments. Pruning is idempotent: already-removed cache tags are ignored instead of warning in the live watcher log.

Deployment build lock

Blue/green deploys coordinate on a JSON lock file under tmp/docker-web/watch/blue-green-deployment-build.lock (owner PID, command, deployment kind, and a re-entrant token for nested helper calls).

On Linux, the helper compares /proc/<pid>/cmdline to the recorded lock so a reused PID after a crash cannot masquerade as an in-flight deploy. The recorded command is the package script name (bun serve:web:docker:bg), but production deploys usually run as node scripts/docker-web.js ...; the matcher treats those as the same holder so a live node deploy is not cleared as a stale PID reuse. On macOS and Windows, the same age-based stale window (DOCKER_WEB_DEPLOYMENT_LOCK_STALE_AFTER_MS, default eight hours) still clears abandoned locks because /proc validation is unavailable. When no web-proxy / web-blue / web-green / tanstack-web-blue / tanstack-web-green containers exist, the auto-deploy watcher also runs the same stale-lock sweep before cached recovery.
DOCKER_WEB_DEPLOYMENT_LOCK_STALE_AFTER_MS: optional override for the default eight-hour window used when /proc is unreadable but kill(pid, 0) still reports a process (for example permission quirks). Set to 0 to disable age-based assists.
DOCKER_WEB_CANCEL_ACTIVE_BUILD=1 or --cancel-active-build on a manual bun serve:web:docker:bg run stops the watcher/buildkit services, clears the lock, and records a canceled history row before starting fresh.
The auto-deploy watcher treats an active deployment lock as a wait state, not a failed deploy attempt. Recovered pending handoffs, reconcile builds, standby refreshes, platform promotions, and imported Infrastructure project builds all defer behind the same lock so only one deployment build runs across the stack.
The watcher also treats a build lock older than 30 minutes as a timed-out build. For another live deployment PID, it sends SIGTERM to the recorded owner. If the lock is owned by the watcher process itself and the watcher has already returned to the polling loop, the lock is treated as leaked cached-recovery state and cleared without signaling the watcher. In both cases, the watcher records a failed deployment history row with the timeout reason and waits until the next polling cycle before retrying. Override the window with DOCKER_WEB_WATCHER_BUILD_TIMEOUT_MS; set it to 0 only when an operator explicitly wants to disable watcher-side build termination.
The apps/web Dockerfile deps stage retries bun install --frozen-lockfile up to three times with a Bun cache scrub between attempts. If the build still exits with bun install --frozen-lockfile exit code 1 after a git pull, regenerate bun.lock in a development checkout, commit the reviewed lockfile update, and deploy that commit. Do not let the production host rewrite bun.lock as part of the auto-deploy path. If tarball extraction still fails (for example @biomejs/cli-linux-x64), the blue/green helper prunes BuildKit exec cache mounts, recreates the Compose-owned buildkit service, and retries once with docker compose build --no-cache so a cached failed deps stage is not reused. If BuildKit has already lost transport and the exec-cache prune command exits with EOF or code = Unavailable, that prune is treated as best-effort and the service is still recreated before retrying. The same one-time fresh retry is used for CACHED ERROR ... COPY --from=deps and for the build watchdog timeout.

Monitoring Surfaces

The infrastructure monitoring UI in apps/infrastructure is intentionally split into smaller pages instead of one oversized dashboard:

/{wsId}/monitoring for the operator overview, runtime snapshot, cron health summary, and jump points into deeper surfaces.
/{wsId}/monitoring/cron for cron job schedules, global enable/disable control, manual run requests, recent execution status, route responses, and captured web-container console logs.
/{wsId}/monitoring/rollouts for rollout controls, deployment charts, event streams, and ledger history.
/{wsId}/monitoring/requests for paginated proxy request history backed by the durable JSONL request store under tmp/docker-web/watch/blue-green-request-logs/.
/{wsId}/monitoring/watcher-logs for paginated watcher log browsing backed by tmp/docker-web/watch/blue-green-auto-deploy.logs.json.

Operationally, keep the overview route lightweight and treat request/log archives as dedicated drill-down pages. The summary snapshot is for quick operator context; durable history should be paged from the persisted ledgers.

Build Resource Caps

When build and serve run on the same machine, use the Docker web helper’s Buildx throttling options instead of letting BuildKit consume the full host. Example:

bun serve:web:docker:bg -- --build-memory auto --build-cpus 4 --build-max-parallelism 1

Current root-script defaults:

bun serve:web:docker defaults to --build-memory auto --build-cpus 4 --build-max-parallelism 1
bun serve:web:docker:bg defaults to --build-memory auto --build-cpus 4 --build-max-parallelism 1

The helper resolves auto from Docker’s reported memory limit before starting, restarting, or recreating the Compose-owned BuildKit service. On a Docker Desktop allocation of 28 GiB, that means the BuildKit memory cap resolves to about 21 GiB: the helper keeps a real reserve for the Docker VM and other containers instead of assigning nearly the full Docker allocation to BuildKit. Direct Compose use still has a concrete mem_limit fallback of 12g and cpus 4 when DOCKER_WEB_BUILD_MEMORY and DOCKER_WEB_BUILD_CPUS are unset, because Compose cannot resolve the helper’s auto value by itself. Raise or lower the caps with env vars or helper flags when your machine is tighter or has spare capacity. For blue/green runs that use the root-script defaults, the helper also keeps a machine-local adaptive profile at tmp/docker-web/buildkit/resource-profile.json. If BuildKit fails with a transport or resource-pressure signature such as code = Unavailable, closing transport, error reading from server: EOF, received prior goaway, ResourceExhausted, cannot allocate memory, context deadline exceeded, or [internal] waiting for connection, the deploy keeps retrying lower profiles in the same command until the build succeeds or the budget-aware retry ladder is exhausted. Each retry persists the selected profile for later runs, skips fixed profiles that exceed the effective Docker memory budget during normal selection, and can use a larger hard-limit rescue profile after conservative profiles are exhausted if Docker’s reported memory limit still has headroom. For explicit memory-exhaustion signatures such as cannot allocate memory or exit code 137, the helper can prefer that hard-limit rescue before smaller profiles that are unlikely to help. When a later default run sees stale persisted fallback state, it promotes that state back to the largest Docker-hard-limit rescue profile, preferring lower CPU at equal memory, before starting BuildKit. The helper recreates the Compose-owned BuildKit service even if the cleanup prune command itself fails with the same transport signature, then recreates the remote Buildx builder if docker buildx inspect tuturuuu reports Status: inactive. The profile ladder is:

default: auto, 4 CPUs, max parallelism 1
stable: 16g, 2 CPUs, max parallelism 1
low: 10g, 2 CPUs, max parallelism 1
serial: 10g, 1 CPU, max parallelism 1
minimal: 8g, 1 CPU, max parallelism 1
floor: 6g, 1 CPU, max parallelism 1

The serial profile intentionally keeps the low memory cap while reducing Next/Turbo concurrency. Prefer that retry before shrinking the BuildKit memory cap when low gets through compilation or page-data work but exits with code 137. If the floor profile still fails with the same BuildKit infrastructure signature, the helper resets the machine-local profile back to the budget-derived default profile. If that profile also fails and Docker’s reported memory limit can safely fit a larger fixed profile, the helper can retry low (10g) before surfacing the failure. This prevents a machine from getting stuck starting every future deploy at floor while still allowing memory-starved Next builds to recover on Docker VMs with enough real capacity. Delete tmp/docker-web/buildkit/resource-profile.json only as an emergency manual reset back to the default profile. Explicit build cap flags or DOCKER_WEB_BUILD_MEMORY, DOCKER_WEB_BUILD_CPUS, or DOCKER_WEB_BUILD_MAX_PARALLELISM opt out of the adaptive profile for that run. You can still override those defaults per run by appending your own flags after --, for example:

bun serve:web:docker:bg -- --build-memory 16g --build-cpus 4 --build-max-parallelism 2

Equivalent environment variables:

DOCKER_WEB_BUILD_MEMORY=16g
DOCKER_WEB_BUILD_CPUS=4
DOCKER_WEB_BUILD_MAX_PARALLELISM=2
DOCKER_WEB_BUILD_BUILDER_NAME=tuturuuu
DOCKER_WEB_BUILDKIT_PORT=7914
DOCKER_WEB_BUILDKIT_ENDPOINT=tcp://127.0.0.1:7914
DOCKER_WEB_BUILDKIT_PRUNE_AFTER_BUILD=0 for blue/green watcher handoffs
DOCKER_WEB_BUILDKIT_PRUNE_MODE=bounded|all|off (bounded is the default)
DOCKER_WEB_BUILDKIT_PRUNE_UNTIL=168h
DOCKER_WEB_BUILDKIT_PRUNE_KEEP_STORAGE=50gb
DOCKER_WEB_BUILDKIT_STOP_AFTER_BUILD=0 to keep the buildkit container warm after a build
DOCKER_WEB_DOCKER_MEMORY_LIMIT=<bytes from docker info>
DOCKER_WEB_STATIC_PAGE_GENERATION_TIMEOUT=180
DOCKER_WEB_STATIC_GENERATION_MAX_CONCURRENCY=auto
DOCKER_WEB_NEXT_BUILD_CPUS=auto
DOCKER_WEB_NEXT_APP_ONLY=1
DOCKER_WEB_NODE_MAX_OLD_SPACE_SIZE=auto
DOCKER_WEB_NEXT_BUILD_ENGINE=turbopack
DOCKER_WEB_REACT_COMPILER=1

How it works:

The helper starts the Compose-owned buildkit service and then creates or reuses the remote Buildx builder named by DOCKER_WEB_BUILD_BUILDER_NAME. The container is named ${COMPOSE_PROJECT_NAME:-tuturuuu}-buildkit-1, so it stays visually grouped under the tuturuuu Docker Desktop stack.
The BuildKit caps accept auto. Auto memory uses Docker’s reported memory limit minus a small host overhead buffer, rounded down to MiB precision; auto CPU uses 1 CPU below 10 GB, 2 CPUs below 16 GB, and 4 CPUs on larger Docker allocations; auto max parallelism uses 1 below 16 GB and 2 above that. The E2E runner and production web serve scripts use auto memory by default so local Playwright verification and watcher builds adapt to the current Docker Desktop setting without requiring one-off env overrides.
DOCKER_WEB_BUILD_MEMORY caps the Compose-owned BuildKit service’s memory budget.
DOCKER_WEB_BUILD_CPUS sets the BuildKit service CPU budget.
DOCKER_WEB_BUILD_MAX_PARALLELISM writes a BuildKit config that limits concurrent solve steps, which is often the most effective way to reduce CPU spikes on smaller machines.
Host-side helper runs point Buildx at DOCKER_WEB_BUILDKIT_ENDPOINT (default tcp://127.0.0.1:${DOCKER_WEB_BUILDKIT_PORT:-7914}). The watcher container uses tcp://buildkit:1234 on the Compose network.
Blue/green watcher handoffs preserve the Compose-owned BuildKit cache volume by default (DOCKER_WEB_BUILDKIT_PRUNE_AFTER_BUILD=0), but stop and remove the buildkit container after the build/deploy phase (DOCKER_WEB_BUILDKIT_STOP_AFTER_BUILD=1). This frees idle CPU and memory while keeping layer state for the next deployment. Set DOCKER_WEB_BUILDKIT_STOP_AFTER_BUILD=0 only when an operator intentionally wants BuildKit to stay warm after a handoff.
When BuildKit pruning is enabled, the default mode is bounded: the helper runs docker buildx prune --filter until=168h --keep-storage 50gb for the active builder. Use DOCKER_WEB_BUILDKIT_PRUNE_MODE=all only for disposable build state, or DOCKER_WEB_BUILDKIT_PRUNE_MODE=off to skip pruning without changing older DOCKER_WEB_BUILDKIT_PRUNE_AFTER_BUILD callers.
Dockerized E2E is the exception: its per-run BuildKit state is disposable and scripts/run-web-e2e-docker.js sets DOCKER_WEB_BUILDKIT_PRUNE_AFTER_BUILD=1 and DOCKER_WEB_BUILDKIT_PRUNE_MODE=all unless explicitly overridden.
The same max-parallelism value is also forwarded as COMPOSE_PARALLEL_LIMIT when that variable is not already set. When the limit is 1, the blue/green workflow builds each Bake target group separately so image export and web compilation do not overlap on memory-constrained hosts.
When Docker reports less than 10 GB of total memory for a blue/green run, the helper also restarts the Compose-owned buildkit service immediately before the build batch. This clears long-lived BuildKit RSS before the replacement web image builds while the active lane is still running. Set DOCKER_WEB_BUILDKIT_RESTART_BEFORE_BUILD=0 to skip that low-memory restart, or 1 to force it on a larger host.
Docker web builds use bun run build:web:docker, which keeps the normal web build dependency graph, sets NODE_OPTIONS=--max-old-space-size to at least 4 GB on Docker allocations below 10 GB, then scales to 8 GB, 12 GB, or 16 GB based on the lower of Docker’s reported memory limit and the selected BuildKit memory cap, falling back to DOCKER_WEB_BUILD_MEMORY for environments where Docker memory cannot be detected. The helper reads Docker’s MemTotal and forwards it as DOCKER_WEB_DOCKER_MEMORY_LIMIT; auto buckets reserve 1 GB of effective Docker build memory for BuildKit, the active runtime lane, and sidecar overhead before selecting the Node heap bucket on larger allocations. Docker production builds use Turbopack under the real Node 24 runtime with App Router-only compilation and React Compiler enabled. This avoids Bun runtime crashes while loading native Next SWC modules and keeps local E2E aligned with production Docker builds. The Docker builder stage is based on node:24-bookworm-slim and copies the Bun binary in only for workspace script orchestration, so the actual next build process runs under real Node instead of Bun’s node shim. The @tuturuuu/web build:docker script delegates to scripts/run-web-docker-next-build.js, which spawns DOCKER_WEB_NODE_BINARY (pinned by the Dockerfile to /usr/local/bin/node) for the Next CLI and honors DOCKER_WEB_NODE_MAX_OLD_SPACE_SIZE=auto by default. Set a numeric heap value only when you need to override the bucket selection; values below 4096 MB are rejected. Set DOCKER_WEB_NEXT_APP_ONLY=0 only when you need to compare a full-app build. Keep DOCKER_WEB_NEXT_BUILD_ENGINE on its default Turbopack value for production watcher hosts and local E2E runs. The wrapper also passes Turbo’s --concurrency flag inside the Dockerfile build. It uses DOCKER_WEB_TURBO_CONCURRENCY when explicitly set, otherwise it follows the current DOCKER_WEB_BUILD_MAX_PARALLELISM profile value and defaults to 1. BuildKit max parallelism limits Docker build graph execution; this inner Turbo cap limits concurrent workspace package builds such as @tuturuuu/types, @tuturuuu/devbox, and @tuturuuu/masonry.
Docker standalone Next builds default static page generation to a 180 second timeout. Legacy all-in-Docker builds auto-scale the inner Next build CPU count plus static generation concurrency from Docker memory. Docker allocations below 10 GB use 1 Next build CPU and static generation concurrency 1; 10-16 GB allocations use 2 for both; 16 GB and larger allocations use 4 for both. The Compose-owned BuildKit service still defaults to a 4 CPU budget, while the inner Next workers stay lower on smaller hosts to avoid OOM kills when the same machine is also running the active blue/green lane and sidecars. Native host builds do not set DOCKER_WEB_STATIC_GENERATION_MAX_CONCURRENCY or DOCKER_WEB_NEXT_BUILD_CPUS unless the operator explicitly provides them. Override those with DOCKER_WEB_STATIC_PAGE_GENERATION_TIMEOUT, DOCKER_WEB_STATIC_GENERATION_MAX_CONCURRENCY, and DOCKER_WEB_NEXT_BUILD_CPUS only when a host needs a fixed worker count.
Hive Docker images use a filtered workspace install and a Next standalone runner. Before apps/hive runs next build, the image must build @tuturuuu/types, @tuturuuu/internal-api, and @tuturuuu/supabase because those packages expose production dist/* subpath exports that Turbopack resolves during the standalone build. Hive realtime installs its filtered production workspace with Bun’s hoisted linker so the direct bun apps/hive-realtime/src/index.ts runtime can resolve top-level production packages such as postgres and @tuturuuu/realtime. Keep .dockerignore explicit about recursive generated directories such as **/.next/**, tmp/**, and apps/mobile/build/**; otherwise previous local builds can be copied into the next Docker context and inflate small sidecar images by several gigabytes.

Operational notes:

These caps affect image builds, not the runtime apps/web container after it has started.
If no build caps are configured, the helper continues using Docker’s default builder behavior.
Do not switch capped builds back to the Buildx docker-container driver. It creates Docker-managed containers named like buildx_buildkit_* outside the Compose project, which makes Docker Desktop grouping and health reporting confusing.
During capped-build setup, the helper removes known legacy Buildx builders such as platform-web-capped-builder before creating or reusing the Compose-owned remote tuturuuu builder. If docker buildx ls still shows that legacy builder, run the capped web deploy helper once so it can clean up the stale Buildx record.
A lower parallelism setting usually trades build speed for host stability.
If BuildKit fails with ResourceExhausted, cannot allocate memory, or exit code 137 while host memory still appears available, Docker’s VM/cgroup budget is the relevant build budget. Prefer the default --build-memory auto path: it now uses a conservative budget and still lets the adaptive profile lower pressure automatically. Raise DOCKER_WEB_BUILD_MEMORY/--build-memory only when docker info shows enough Docker memory and host swap is not saturated.
Docker Desktop resource graphs can show memory near the configured maximum because the Docker VM and file cache are preallocated or cached. Treat docker info --format '{{json .MemTotal}}' as the build budget source of truth, then use process/container RSS and BuildKit errors to decide whether the build is truly exhausting memory.
If docker compose reports services[buildkit].mem_limit invalid size: 'auto', the command path is bypassing the Docker web helper’s Compose env resolver or the host is running an older commit. The helper-only auto value must be converted to a concrete MiB value before any docker compose up, restart, stop, rm, or health-check call reads the BuildKit service.
If host memory or swap is saturated, lower --build-max-parallelism first and stop unrelated containers before raising the builder memory cap.
If the build loops on exit code 137 or SIGSEGV (Address boundary error) inside @tuturuuu/*:build package tasks, treat it as inner Turbo concurrency pressure in addition to BuildKit pressure. Keep the default Docker web helper profile so DOCKER_WEB_BUILD_MAX_PARALLELISM=1 reaches the Dockerfile build, or set DOCKER_WEB_TURBO_CONCURRENCY=1 explicitly for a one-off recovery.
If the default blue/green deploy keeps failing with BuildKit transport EOFs, context deadline exceeded, [internal] waiting for connection, or graceful-stop messages, leave the root-script defaults in place and let the adaptive profile step down through the lower profiles automatically. Use docker buildx ls or docker buildx inspect tuturuuu to confirm whether the remote builder is inactive. Use explicit caps only when you want to bypass the remembered local profile for a one-off run.
If Bun fails during an image install with a tarball extraction error such as Fail extracting tarball for "@biomejs/cli-linux-x64", the blue/green helper treats it as BuildKit exec-cache corruption once per deployment attempt. It prunes BuildKit exec cache mounts, recreates the Compose-owned buildkit service, and retries the build once with --no-cache. If the prune command fails because BuildKit has already dropped transport, recovery still proceeds to service recreation. A second failure is recorded as a real deployment failure with the original command context preserved in logs.
If BuildKit reports CACHED ERROR after a failed deps stage, or if the compose build exceeds DOCKER_WEB_BUILD_TIMEOUT_MS (default 45 minutes), the helper uses the same one-time cache recovery and fresh --no-cache retry.
If a deployment fails during the build, the watcher captures the actionable failure lines into the retained deployment history as failureReason. The Infrastructure → Monitoring → Deployments page and rollout ledger display that reason inline so operators do not need to reconstruct failures from a terminal scrollback.

Redis Profile

Redis is enabled by default in both dev and production-style Docker web stacks. The Redis and serverless-redis-http host ports bind to 127.0.0.1 only; do not expose them through Cloudflare Tunnel, public firewall rules, or all-interface Docker port mappings. The helper persists the generated token in:

tmp/docker-web/redis-token

and injects these values into apps/web automatically:

UPSTASH_REDIS_REST_URL=http://serverless-redis-http:80
UPSTASH_REDIS_REST_TOKEN=<generated local token>

The production Redis compose fragment requires UPSTASH_REDIS_REST_TOKEN during Compose interpolation. Use the Docker web helper, which injects the generated token automatically, or export a strong token before running direct docker compose --profile redis ... commands. Service env_file entries do not satisfy Compose interpolation for the Redis HTTP bridge token. Docker Redis mode intentionally ignores generic UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN values from the host shell. This prevents old Upstash REST URLs from leaking into self-hosted Docker containers after the Upstash instance is shut down. If a Docker host must override the bundled Redis sidecar, use the Docker-specific DOCKER_UPSTASH_REDIS_REST_URL and DOCKER_UPSTASH_REDIS_REST_TOKEN variables. Watcher-managed Infrastructure projects do not receive the integrated platform Docker Redis token. New managed projects start with redis_enabled=false, and a project with Redis enabled receives UPSTASH_REDIS_REST_URL, UPSTASH_REDIS_REST_TOKEN, and SRH_TOKEN only when project-scoped credentials are set with MANAGED_PROJECT_<PROJECT_ID>_UPSTASH_REDIS_REST_URL and MANAGED_PROJECT_<PROJECT_ID>_UPSTASH_REDIS_REST_TOKEN, where <PROJECT_ID> is the normalized project id uppercased with dashes converted to underscores. Generic host UPSTASH_REDIS_REST_* and Docker-specific DOCKER_UPSTASH_* values are stripped from managed project compose runs so untrusted project code cannot access platform Redis credentials. If you intentionally want to exercise Redis-unavailable fail-open behavior, opt out:

bun dev:web:docker -- --without-redis

That opt-out disables both the bundled Redis companion services and the Docker-injected UPSTASH_REDIS_REST_URL / UPSTASH_REDIS_REST_TOKEN variables. apps/web treats Redis-backed route rate limits, abuse counters, and IP blocks as fail-open availability guardrails in this mode: requests continue through normal auth, authorization, payload-size, Turnstile, suspension, and validation checks, but rate-limit buckets and IP-block enforcement are skipped. Defense-in-depth one-time state such as CLI refresh-token replay protection also continues without Redis after JWT validation and user lookup. Confirmed Redis-backed replay attempts are still rejected when Redis is available. Vercel-hosted satellite apps such as CMS, Calendar, Finance, Learn, Teach, and Tasks cannot reach Docker-private Redis hosts such as serverless-redis-http. Do not point their Vercel UPSTASH_REDIS_REST_URL at the Docker sidecar or expose Redis through Cloudflare Tunnel. Satellite proxy guards should run without Redis when Upstash is retired; protected product APIs continue to flow through apps/web, where Docker Redis is available.

Cloudflare Tunnel Profile

The Docker compose files include an optional cloudflared service. Enable it when the same host should publish the Dockerized web proxy through Cloudflare Tunnel:

bun serve:web:docker:bg -- --with-cloudflared

Required env:

CF_TUNNEL_TOKEN, CLOUDFLARED_TOKEN, or DOCKER_CLOUDFLARED_TOKEN

When root .env.local contains a non-empty CF_TUNNEL_TOKEN, Docker web helpers automatically enable the cloudflared profile and pass the value to Compose as CLOUDFLARED_TOKEN. If --env-file is passed, the helper applies the same auto-detection to that explicit file. Set DOCKER_WEB_WITH_CLOUDFLARED=0 to keep a configured tunnel token available without starting the cloudflared container; an explicit --with-cloudflared or --profile cloudflared still enables the profile. For a remotely managed Cloudflare Tunnel, configure the public hostname route in Cloudflare to point at the Docker service or the local proxy loopback:

Production blue/green: https://tuturuuu.com -> http://localhost:7803 or http://web-proxy:7803
Dev stack: https://dev.tuturuuu.com or a temporary hostname -> http://localhost:7803 or http://web:7803

The tunnel container shares the web/proxy network namespace, so existing Cloudflare routes that use localhost:7803 resolve to the Docker web service instead of the tunnel container itself. Keep cms.tuturuuu.com and other satellite app hostnames on Vercel unless those apps are explicitly moved into this Docker stack. Production compose binds host-published web, Hive, Meet, and Redis ports to 127.0.0.1 only. Do not remove that loopback prefix during blue/green migration; public exposure should go through Cloudflare Tunnel or another controlled frontend, not the staged Docker host ports. When blue/green is deployed with --with-cloudflared, the watcher receives DOCKER_WEB_WITH_CLOUDFLARED=1 so future auto-deploys keep the tunnel profile active and do not remove the cloudflared container as an orphan.

Auto-Pull Blue/Green Watcher

For simple self-hosted boxes that deploy directly from a Git branch, the repo also provides a long-running auto-deploy watcher:

bun serve:web:docker:bg:watch

That command now bootstraps Docker instead of running the watcher loop as a host PID. Each invocation:

Writes the forwarded watcher CLI args to tmp/docker-web/watch/blue-green-auto-deploy.args.json.
Rebuilds and force-recreates the dedicated web-blue-green-watcher service.
Builds and force-recreates web-docker-control so direct admin recovery uses the current sidecar code.
Builds and starts web-cron-runner with --no-recreate so native Docker cron stays available without interrupting a healthy runner.
Tails the watcher container’s live logs so the terminal still shows the watcher dashboard.

The watcher container mounts:

the repo worktree at /workspace
the same repo again at the real host checkout path via PLATFORM_HOST_WORKSPACE_DIR, so host Docker bind mounts resolve against the host filesystem when the watcher shells into docker compose ...
the linked-worktree common Git directory through DOCKER_WEB_GIT_COMMON_DIR when .git is a file, so in-container Git commands can resolve .git/worktrees/... metadata from Docker-mounted checkouts
/var/run/docker.sock so it can manage the blue/green compose stack itself
the shared Bun install cache volume
a dedicated watcher node_modules volume so the frozen dependency install stays container-local

Behavior:

Reads the built-in platform project from the log-drain Postgres project registry. The production watcher service is wired with PLATFORM_LOG_DRAIN_DATABASE_URL so a live watcher can consume queued Infrastructure project deployments instead of falling back to the legacy single-branch loop.
The seeded branch is production, but operators can change it from Infrastructure → Monitoring → Projects. If the selected project branch differs from the current checkout, the watcher restarts its child process, resets tracked changes, removes untracked files, fetches, and checks out that branch. Set DOCKER_WEB_WATCHER_WORKTREE_RESET_DISABLED=1 to restore the protective dirty-worktree block instead. If the watcher is already stuck or missing, the monitoring UI asks web-docker-control to ensure the watcher before ensuring or restarting web-cron-runner. If direct control is unavailable, the UI keeps the stalled request visible so host supervisor recovery can be diagnosed instead of appearing queued forever.
Locks the selected local branch and tracked upstream at startup.
Writes a PID-backed lock file at tmp/docker-web/watch/blue-green-auto-deploy.lock.
Renders a live terminal dashboard with the locked branch, tracked upstream, latest local commit, relative commit age, last check time, next poll time, current blue/green runtime state, and recent watcher events.
Polls the tracked upstream every 1000ms by default.
Auto-clears and redraws the dashboard in place on each state change when attached to a TTY.
Runs the Git and deploy subprocesses quietly so the dashboard is not disrupted by git fetch, git reset, or Docker build output during normal watcher operation.
Treats the watcher-managed checkout as disposable by default. Before each upstream comparison it runs git reset --hard HEAD, git clean -fd, fetches the locked upstream, and hard-resets to the tracked upstream when local HEAD is behind, ahead, or diverged. Ignored files are left alone.
Set DOCKER_WEB_WATCHER_WORKTREE_RESET_DISABLED=1 only when you need to preserve manual edits in a deployment clone. With that escape hatch, dirty worktrees block polling, ahead/diverged branches are skipped, and only fast-forward pulls are attempted.
Runs bun install --frozen-lockfile automatically after every successful upstream sync so installed dependencies match the reviewed bun.lock before the deploy handoff continues. The watcher does not run bun upgrade or a non-frozen install on the production host.
Resets dirty bun.lock changes by default with the rest of the disposable checkout. When DOCKER_WEB_WATCHER_WORKTREE_RESET_DISABLED=1 is set, dirty bun.lock remains a blocking worktree change.
Runs bun serve:web:docker:bg automatically after a successful upstream sync.
Polls imported Infrastructure projects from log-drain Postgres, synchronizes enabled public GitHub projects into tmp/docker-web/projects/<projectId>/repo, deploys them through generated Next.js compose files under the shared tuturuuu Compose project, and merges hostname routes into the central nginx proxy. The imported-project and manual deployment queue cadence is independent from the normal Git polling interval, so a watcher configured with a long Git interval such as 1000 seconds still wakes on the shorter project queue interval to advance queued Deploy actions. Platform project state is updated on both queue-only deploys and normal upstream deploys, so a successful sync/deploy clears queued and refreshes the latest commit columns instead of relying only on deployment history. Imported project builds share the same deployment build lock as platform blue/green builds. If platform, standby, recovery, or another imported project build is already active, the project poll is deferred instead of starting a second Docker build.
The watcher no longer prebuilds main or advances production on its own. Advance production outside the watcher through the release process, then let the watcher deploy the locked branch that is already checked out on the host.
Rollback pins intentionally pause normal upstream sync for the pinned deployment state. Remove the pin when the locked branch contains the corrective commit that should resume normal deployment.
Infrastructure operators can queue tmp/docker-web/watch/control/blue-green-deployment-revert.request.json to revert production to a retained successful deployment. The watcher keeps the 5 newest unique successful deployed image tags for instant revert; a cached revert verifies the cached target first, cancels any active blue/green build, retags the selected image, starts active/standby with --no-build, health-checks through the normal proxy path, records deployment kind instant-revert, and writes a deployment pin so normal upstream deployment does not immediately overwrite the rollback. Older retained deployments remain revertable through the existing rollback pin path, which may rebuild because no cached image is available and still respects the active build lock.
After forward database migrations finish, blue/green workflows remove completed hive-db-migrate and supermemory-db-migrate containers by Compose service labels. This catches stopped one-off docker compose run containers such as tuturuuu-hive-db-migrate-* that can otherwise make the Docker cluster look unhealthy even after the migration succeeded.
If watcher runtime code such as scripts/watch-blue-green-deploy.js, scripts/docker-web/blue-green.js, or scripts/docker-web/env.js changed in the pulled revision, the current watcher does not deploy from the old process. It releases its lock, spawns a replacement watcher with the same CLI args, and exits first.
The replacement watcher refreshes the live web-proxy nginx config and workers in place if blue/green is already serving traffic, verifies proxy routing through /__platform/drain-status, and only then starts the new blue/green build/promotion.
If compose or helper-image wiring changed, including docker-compose.web.prod.yml, Hive service files, MarkItDown service files, apps/storage-unzip-proxy package/source files, or apps/web/docker/cron-runner*, the containerized watcher recreates its own compose service before the pending deploy handoff. The deploy then includes only the affected buildable helper images in the blue/green build command instead of rebuilding every service on every commit.
Retries recoverable Git command failures instead of exiting. The first retry waits 1 minute, then the watcher backs off exponentially on consecutive Git failures up to a 15 minute ceiling.
Caps deployment attempts at 3 failures per commit. A recovered pending handoff failure is recorded, the pending request is cleared, and the watcher keeps polling; once the cap is reached, that commit reports retry-limited until a new commit is available or an operator pins a different deployment.
Stops immediately if the checked-out branch changes while the watcher is running.
If another watcher already owns the lock, a new invocation can fail with guidance, mirror the active watcher with --resume-if-running, or replace it with --replace-existing.

Operational notes for the containerized watcher:

Manual bun serve:web:docker:bg and watcher-triggered deploys share a deployment-build lock at tmp/docker-web/watch/blue-green-deployment-build.lock. This lock is separate from blue-green-auto-deploy.lock: the watcher may remain alive, but only one build/deploy phase can be active across manual deploys, watcher upstream sync, standby refreshes, rollback pins, cached recovery, and reconcile deploys.
If a manual deploy sees that lock or a live watcher status of building or deploying, an interactive terminal prompts before it interrupts the active deployment. Confirming stops web-blue-green-watcher, stops/resets the Compose-owned BuildKit work, clears the active build lock/status, records the interrupted entry as canceled, then starts the requested deployment alone.
Non-interactive manual automation fails fast on an active deployment unless --cancel-active-build or DOCKER_WEB_CANCEL_ACTIVE_BUILD=1 is provided. Use that override only when it is acceptable to interrupt all BuildKit work owned by the platform deployment stack.
Re-running bun serve:web:docker:bg:watch intentionally recreates the watcher container so it picks up local repo changes, new CLI args, and watcher-image updates in one path.
The host log follower treats Docker’s 143 exit from an intentionally recreated watcher container as a reconnect signal, then reattaches to the replacement service instead of leaving the terminal dark.
If the followed watcher logs explicitly request host-supervised watcher service recreation, the host wrapper force-recreates web-blue-green-watcher before reattaching. Do not rely only on Docker’s restart policy in this path: the old container can briefly report healthy while still running the stale image/runtime.
Git fetch/pull credentials now need to be usable inside the watcher container because the watcher no longer runs directly on the host.
Full Docker daemon or Docker Desktop crashes cannot be recovered by a watcher that is itself running inside Docker. Keep bun serve:web:docker:bg:watch running from the host, ideally under systemd, launchd, or another host process supervisor. That host command waits for the Docker daemon to respond again, reruns the watcher compose up --build --detach --force-recreate, and then resumes tailing logs. Every hosted project with its own Docker watcher needs its own host-side watch process; container restart: policies only help after Docker is already healthy again.
The host Docker recovery loop polls bounded Docker CLI probes every 5 seconds by default. Override with DOCKER_WEB_WATCHER_DOCKER_RECOVERY_POLL_MS; tune each quick probe with DOCKER_WEB_WATCHER_DOCKER_PROBE_TIMEOUT_MS. By default it waits indefinitely because a host process manager is expected to own the terminal process; set DOCKER_WEB_WATCHER_DOCKER_RECOVERY_TIMEOUT_MS to a positive value to fail after a bounded recovery window.
The watcher image lives at apps/web/docker/blue-green-watcher.Dockerfile.
Its entrypoint wrapper relaunches the watcher in-place when scripts/watch-blue-green-deploy.js requests a self-restart after pulling a new watcher revision.
The entrypoint is also the watcher supervisor. It restarts the child process after crashes, after the status snapshot fails to appear during startup, or after blue-green-auto-deploy.status.json becomes stale. The compose service uses restart: unless-stopped so Docker also brings the watcher back after a daemon or container failure.
A stale status snapshot is tolerated while the snapshot already shows an active building or deploying deployment. During a long docker compose build, the watcher child is intentionally busy inside the deploy command and may not rewrite the status file until the command exits. The wrapper keeps the child alive until DOCKER_WEB_WATCHER_BUILD_TIMEOUT_MS plus a short grace window, then treats the stale snapshot as unhealthy.
bun serve:web:docker:bg:down also stops the watcher service because it is part of the production compose stack now.

Dashboard details:

Shows the current active blue/green color when web-proxy is serving live traffic.
Docker resource rows use the running containers directly as a fallback when docker compose ps cannot inspect the prod stack because of env interpolation issues, so watcher metrics can still appear on an already-live deployment.
Docker stats are read with an explicit field format instead of Docker’s version-dependent JSON object shape, which avoids bogus 0 CPU/memory readings when the watcher is running against a different Docker release.
The watcher parser also normalizes locale-style decimal commas from docker stats, so hosts that emit values like 0,10% or 24,0MiB no longer collapse into zeroed metrics.
Each watcher snapshot now includes docker ps metadata for every running container visible through the host Docker socket, plus compose service health for containers in the production project. The monitoring overview uses that persisted snapshot to show service health and a full running-container inventory without mounting the Docker socket into apps/web.
The request archive view computes route summaries, status totals, RSC counts, and error totals across the selected timeframe instead of only the visible page. The default timeframe is seven days, the API rejects unbounded or oversized windows, and operators can query at most 30 days of retained request logs at a time. The web API keeps a short in-process aggregate cache keyed by bounded timeframe plus telemetry log file stats, but the cache stores only aggregate analytics so request rows are not retained in memory between page reads.
Drive ZIP extraction does not stream extracted file bytes back through the web app proxy. The unzip worker requests a per-entry signed upload URL from the callback route and uploads extracted files directly to trusted storage origins only, which avoids nginx body-size limits for large WebGL artifacts while keeping folder creation and auth checks in the backend callback.
Hive is promoted with the web blue/green color: hive-blue and hive-green are routed from hive.tuturuuu.com, and hive-realtime serves /realtime with HIVE_REALTIME_TOKEN_SECRET, HIVE_REALTIME_URL, and NEXT_PUBLIC_HIVE_REALTIME_URL configured in the same production stack. Hive product data is stored in the Docker-managed hive-postgres service via HIVE_DATABASE_URL; Supabase remains the identity/session source only. The web, web-cron-runner, hive-{color}, and hive-realtime services must all receive that URL so API routes, disabled-by-default simulation cron, the editor, and the CRDT realtime service share the same Hive product database. Optional local LLM support runs behind the hive-ollama profile and is disabled unless operators enable the profile and Hive settings enable the exact gemma4 model. Production compose publishes 127.0.0.1:7814:7814 from web-proxy, not from a direct Hive container, so host-local or Cloudflare tunnel traffic to localhost:7814 always reaches the currently promoted Hive color without exposing staged migration ports on every host interface. Deploys verify that the running web-proxy container has the required loopback host bindings (7803, 7814, and 7816) and that its running image matches the resolved Compose image before reusing it; if an older proxy was created before Hive moved behind blue/green or before the nginx image pin changed, the next deploy force-recreates the proxy so the host-level Cloudflare Tunnel route can reach Hive on the expected proxy runtime. The Hive color services use the same Supabase env source as apps/web: runtime env files are shared, and production image builds mount the web_env BuildKit secret so hidden-locale auth pages can prerender with the platform Supabase URL. Deploy coordination: scripts/docker-web/blue-green.js still scopes prod builds by changed service group, but runtime promotion happens in staged order: first the target web-{color}, then hive-{color} and hive-realtime, then refreshed support services such as backend, MarkItDown, storage-unzip-proxy, web-cron-runner, and optional Redis-backed helpers. If web-proxy or cloudflared must be bootstrapped or recreated for host-port changes, they start only after target web, Hive, and support services are healthy, so hive.tuturuuu.com is not exposed with an empty hive_app_upstream and web is not publicly switched before dependent gates finish. Promotion waits for the final proxy route check before writing active-color.
Every service owned by docker-compose.web.prod.yml should declare a healthcheck, either directly in compose or in the image. The resources inventory treats an Up container without Docker health metadata as healthy for cross-project runtime visibility, but first-party prod services and sidecars still need explicit probes so deploy gates can fail before promotion.
The MarkItDown sidecar needs SUPABASE_URL set to the same Docker-internal Supabase URL used by the web container. The service validates signed Storage URLs before downloading attachments, and local Docker runs may use host.docker.internal over HTTP.
MarkItDown source changes and storage-unzip-proxy package/source changes are part of the watcher refresh globs. Keep those globs in sync with any future sidecar entrypoints so a running watcher refreshes helper containers during the next deploy handoff, not just the web app container.
Host dependency refreshes must not rewrite bun.lock on the production host. The watcher resets a dirty lockfile by default with the rest of the disposable deployment checkout. With DOCKER_WEB_WATCHER_WORKTREE_RESET_DISABLED=1, a dirty lockfile remains a blocking worktree change. Automatic dependency sync uses bun install --frozen-lockfile only.
Runtime upgrades are an explicit operator action. The watcher does not run bun upgrade; update the host Bun runtime only after reviewing the pinned version in the repository and the watcher image.
Recoverable Git poll failures stay visible in the dashboard as a retrying watcher state instead of terminating the process, and the next-check timer reflects the active backoff delay.
If git reset, git fetch, git checkout, or the reset-disabled git pull --ff-only path fails only because a Git lock already exists, the watcher inspects the lock age. Fresh locks are polled inside the watcher process, then removed automatically only when they are stale (older than 2 minutes). This covers linked-worktree index.lock files, .git/packed-refs.lock, and remote-ref locks such as .git/refs/remotes/origin/staging.lock.
Build/deploy failures also stay inside the watcher loop. The watcher records failed attempts in deployment history, clears stale pending handoff files after recovery failures, and stops retrying the same commit after the third failed deployment attempt. Once that cap is reached, it reports the retry-limited state once for that commit instead of logging the same skip on every poll.
Normal promotions keep the long-lived web-proxy container and bound port stable, which avoids transient listener drops for upstreams such as Cloudflare Tunnel that are connected to :7803. Proxy container recreates are reserved for required host-port or image drift and happen only after the replacement web/Hive lane is healthy.
Persists recent deployment history, including manual bun serve:web:docker:bg runs, and renders the top 3 most operationally relevant entries as stacked terminal cards that favor vertical scanability over very wide lines.
Each deployment card now uses a stronger header with status/color badges plus grouped metric bands, so active traffic state, rollout intent, and request-rate data are easier to scan while multiple cards are stacked.
As soon as a new commit starts rolling out, the recent deployment section shows it immediately as DEPLOYING instead of waiting for the rollout to finish.
Each deployment block includes:
- deploy status (ACTIVE, ENDED, or FAILED)
- build time
- activation/finish time
- deployment lifetime while it served traffic
- total requests served during that deployment window
- average requests per minute
- peak requests per minute
- day: requests served on the current day for the active deployment, or the final active day for an ended deployment
- davg: average requests per day across that deployment’s serving lifetime
- dpeak: busiest single-day request count across that deployment’s serving lifetime
The live blue/green summary uses the same traffic metrics as the deployment history cards, with consistent color coding for build/lifetime/traffic/age metrics so the dashboard is easier to scan quickly.
A dedicated Docker resources row summarizes aggregate CPU, memory, and network usage across the live blue/green containers, followed by a per-container row for proxy, green, and blue when those services are running. This is sampled from docker stats --no-stream, so it stays local to the host and is appropriate for self-hosted operator monitoring.
The infrastructure dashboard’s Docker Runtime Inventory uses the watcher snapshot as the source of truth for every running Compose container and derives total CPU and memory from those rows when present, so the summary cards stay aligned with the detailed container inventory.
The bundled serverless-redis-http companion uses an in-container wget health check that posts ["PING"] to / with the generated SRH_TOKEN; do not use a Node-based probe for that image because it is an Erlang release image, and do not probe /ping because SRH does not expose that route.
Production Redis compose requires UPSTASH_REDIS_REST_TOKEN and binds Redis host ports to 127.0.0.1. Do not reintroduce the platform-local-redis-token fallback in production fragments or remove the loopback host bind; direct Compose users must export a strong token before enabling the redis profile.
After a successful host-triggered serve:web:docker:bg rollout, the Docker helper starts or resumes the containerized web-blue-green-watcher with --resume-if-running. Deploys that are already running inside the watcher skip that handoff via PLATFORM_BLUE_GREEN_WATCHER_CONTAINER=1, which avoids recursive watcher starts while still leaving a poller alive for future Git commits.
The watcher wrapper and child process must agree on runtime files. If PLATFORM_BLUE_GREEN_WATCH_ARGS_FILE, PLATFORM_BLUE_GREEN_WATCH_RUNTIME_DIR, or PLATFORM_BLUE_GREEN_WATCH_STATUS_FILE are set, both the wrapper and child use those paths so the wrapper does not restart a healthy child for a missing status snapshot.
Request counters now come from a persisted local proxy-log drain under tmp/docker-web/watch/blue-green-request-telemetry.*, not from one-off docker logs scrapes in the dashboard. The watcher continuously drains structured web-proxy access logs into a local ledger, so request metrics survive watcher restarts and do not require any external analytics service.
Internal proxy health checks for /api/health and /__platform/drain-status are excluded from the request totals so the numbers reflect real served traffic more closely.
The proxy now emits structured JSON access logs that include the upstream deployment stamp and blue/green color. That lets the watcher link requests back to the correct deployment instead of only estimating by time window.
For each newly drained proxy request, the watcher also reads recent stdout/stderr from the selected frontend lanes (web-blue / web-green for DOCKER_WEB_FRONTEND=next, or tanstack-web-blue / tanstack-web-green for DOCKER_WEB_FRONTEND=tanstack) and stores up to 20 route console lines that fall inside the request latency window. Those captured lines are persisted on the request-log record itself so the request explorer can show request-scoped server console output after the live Docker logs have moved on.
The watcher retains up to 10,000 deployment history entries and up to 100,000,000 recent drained request-log records on disk, bounded by a 256 MiB durable request-log byte cap by default. When the next request record would exceed the byte cap, the watcher rotates the current JSONL chunk if needed and prunes older chunks before appending, so public request URIs cannot grow the host-backed ledger without an aggregate limit. Rolling daily/weekly/monthly/yearly metric buckets plus a recent-request excerpt still feed the monitoring dashboard.
The watcher also persists a separate latest-log ledger under tmp/docker-web/watch/blue-green-auto-deploy.logs.json, which captures the high-level poll/pull/build/deploy watcher messages with deployment stamps and commit hashes when available. The infrastructure dashboard uses that ledger for a deployment-scoped latest-log view without needing live docker logs access.
The watcher uses the same Docker runtime env resolution as the real deploy flow, so blue/green status probes still work when the Redis profile is part of the production compose file.
The active watcher also persists a live status snapshot under tmp/docker-web/watch/, which is what --resume-if-running uses to mirror the dashboard without taking over the PID lock.
The infrastructure dashboard at /{ROOT_WORKSPACE_ID}/infrastructure/monitoring reads the same watcher status snapshot and renders it as a Next.js control room with rollout, request-rate, container-resource, and event-feed views.
The monitoring dashboard now exposes paginated request and watcher-log explorers. Route filters come from normalized request paths, the raw request URI still surfaces query signatures, and ?_rsc=* requests are called out so React Server Component traffic is inspectable separately from document hits.
Deployment-facing dashboard surfaces deduplicate successful blue/green rows for the same commit so active and standby colors do not appear as separate rollouts. Failed attempts remain separate because the retry cap and recovery debugging depend on seeing each failed build/deploy attempt. Large deployment, rollback-candidate, Docker-service, and container lists are paginated in the UI instead of rendering every retained row at once.
Production web, web-blue, and web-green containers mount ./tmp/docker-web read-only at /app/runtime/docker-web and use PLATFORM_BLUE_GREEN_MONITORING_DIR to find the watcher snapshot. Keep that mount/env pair in sync if the runtime path changes, or the dashboard will degrade to an empty offline state even while blue/green deployments still work.
Production web, web-blue, and web-green also mount the narrower ./tmp/docker-web/watch/control path read-write at /app/runtime/docker-web-control via PLATFORM_BLUE_GREEN_CONTROL_DIR. Keep operator command files in that control directory so the broader watcher runtime and telemetry mount can stay read-only.
The monitoring dashboard’s “Sync Standby Now” action writes tmp/docker-web/watch/control/blue-green-instant-rollout.request.json. The watcher consumes that file on its next poll, clears it after a success, failure, or no-op, and uses it to rebuild the standby color immediately so blue and green can converge on the same commit without waiting for the stale standby window.
The dashboard reads that pending instant-rollout request back from the watcher control directory. While the request is queued, or while the latest standby refresh is building/deploying, the sync button stays disabled and shows a queued/building status instead of allowing duplicate control files.
The monitoring dashboard’s rollback pin action writes tmp/docker-web/watch/control/blue-green-deployment-pin.json. The watcher treats that file as authoritative: it skips normal fetch/pull work, checks out the pinned commit in detached mode, deploys it if the latest successful deployment is different, and keeps production on that commit until the pin is removed from the dashboard. Removing the pin lets the watcher check out its locked branch again and resume normal fast-forward polling.
In-container watcher child restarts preserve the locked branch/upstream metadata even when the child is killed while Git is detached for a rollback or parent-fallback build. The replacement child can recover production from the target-only lock instead of trying to poll detached HEAD. If an older child already removed the lock, a clean detached startup falls back to the selected platform branch (production by default) before locking and polling.
When the watcher receives a shutdown signal while it is temporarily detached, it attempts to check out the locked branch again before exiting, as long as the worktree is clean. This keeps manual operator commands such as git pull && bun serve:web:docker:bg from inheriting a detached checkout after a stopped watcher.
The watcher image must also include both the Docker Compose and Buildx CLI plugins (docker-cli-compose and docker-cli-buildx on Alpine) because the rollout handoff shells into docker compose ..., and capped production builds create/use the remote tuturuuu Buildx builder from inside the watcher container. The watcher reaches the Compose-owned BuildKit daemon at tcp://buildkit:1234.
When the watcher drives Docker Desktop through /var/run/docker.sock, it must run the deploy handoff from the mirrored host-path mount, not a container-only path like /workspace. Otherwise Docker Desktop rejects bind mounts such as ./tmp/docker-web/prod/nginx.conf with “mounts denied” because /workspace/... is not a real shared host path.
The watcher compose environment preserves PLATFORM_HOST_WORKSPACE_DIR and pins COMPOSE_PROJECT_NAME from that host checkout path unless DOCKER_WEB_COMPOSE_PROJECT_NAME is explicitly set. For linked worktrees, the same compose env injects DOCKER_WEB_GIT_COMMON_DIR and the watcher service mounts that directory at the same absolute path inside the container. This prevents fatal: not a git repository failures for .git/worktrees/<name> paths when .git points outside the worktree mount. A canonical checkout directory named platform maps to the tuturuuu Compose project so Docker Desktop groups the stack under the product name on clean startups. During self-refresh from an already-running legacy platform Compose project, the legacy watcher starts a staged tuturuuu watcher with non-conflicting host ports, then stops only the old watcher service. The target watcher builds or recovers the tuturuuu stack before it touches the public proxy port. When the target proxy is healthy on the staged port, it stops the legacy platform proxy, recreates the tuturuuu proxy on port 7803, and verifies the internal drain-status route within 3 seconds. If that handoff exceeds 3 seconds or the target proxy health check fails, the watcher stops the target proxy, restores the legacy platform proxy, and leaves the legacy project intact for another retry. After a successful handoff, it removes the old platform Compose project with docker compose down --remove-orphans. Once the legacy project is absent, a watcher that lacks inherited Compose project env is treated as the fully migrated tuturuuu watcher and remains in the normal Git poll/build loop instead of starting another migration handoff. Do not inherit arbitrary container-scoped Compose project names from the watcher container; doing so can create duplicate service names such as nested tuturuuu-markitdown-1 containers during self-refresh.
If Docker reports container name ... is already in use for a requested production service, the helper removes only the exact expected container name for the current Compose project, then retries docker compose up. This handles stale names left by interrupted rollouts without pruning unrelated containers.
If docker compose up hits a transient Docker registry or Docker Hub auth timeout while pulling support images, the helper retries the Compose start without treating it as a deployment failure. Tune the bounded backoff with DOCKER_WEB_COMPOSE_UP_RETRY_MAX_ATTEMPTS, DOCKER_WEB_COMPOSE_UP_RETRY_INITIAL_DELAY_MS, and DOCKER_WEB_COMPOSE_UP_RETRY_MAX_DELAY_MS. Stale dependency container references use DOCKER_WEB_COMPOSE_UP_STALE_DEPENDENCY_RETRY_MAX_ATTEMPTS, so registry retry tuning cannot accidentally disable recovery from Compose referencing a removed dependency container. Non-transient Compose errors still fail immediately.
If log-drain-postgres starts but remains unhealthy before promotion, the helper removes and recreates only that service container once, collects docker compose ps, container inspect state, recent service logs, and matching Compose volume names, then continues the rollout with PLATFORM_LOG_DRAIN_ENABLED=false for that run. The production web and web-blue-green-watcher services intentionally do not use Compose depends_on for log-drain; the script-owned preflight is the only log-drain gate. Platform traffic can promote while telemetry and Infrastructure project automation are degraded.
Set DOCKER_WEB_LOG_DRAIN_REQUIRED=1 when an operator wants log-drain-postgres to remain a hard deployment gate. Use that only when the log-drain database is part of the deployment objective and a blocked rollout is preferable to serving without persisted request/server logs.
To inspect a degraded log-drain startup without changing data, run:
docker compose -f docker-compose.web.prod.yml --profile redis ps --all log-drain-postgres docker compose -f docker-compose.web.prod.yml --profile redis logs --tail 200 log-drain-postgres docker volume ls --filter label=com.docker.compose.volume=platform-log-drain-postgres
If those logs mention incompatible database files, data-directory corruption, or an invalid checkpoint, back up or migrate the matching Compose volume before retrying. Do not run docker compose down --volumes, docker volume rm, or any other volume-clearing command for log-drain data unless an operator has explicitly approved a backed-up reset.
Starting bun serve:web:docker:bg:watch now clears the persisted watcher status snapshot and active PID before the watcher service is force-recreated, but preserves any complete branch/upstream target metadata. A stale lock from the previous container cannot block the replacement watcher, and a detached checkout still has enough metadata to reattach to production.
If the watcher pulls a revision that changes its own Dockerfile or baked entrypoint, it now rebuilds and recreates the web-blue-green-watcher service automatically before handing off to the next deployment cycle.
When that container-refresh request is emitted from inside the followed watcher logs, bun serve:web:docker:bg:watch treats the log text itself as the recreate signal, then rebuilds/recreates and resumes tailing. This keeps the service from getting stuck in a Docker-restarted but operationally offline state.
Recovery handoffs now persist a pending-deploy request under tmp/docker-web/watch/ and reconcile it against the latest successful deployment history entry on startup. If HEAD is newer than the last successful built/deployed commit after a watcher restart or container recreate, the watcher builds the current HEAD before settling back into normal polling.
The same reconciliation now also runs during steady-state polling: if Git is already up to date but the latest successful deployment record still points at an older commit, the watcher rebuilds/deploys the current HEAD instead of incorrectly reporting up-to-date.
Blue/green deploys build the replacement lane before stopping or removing any existing blue/green container. A failed docker compose build must leave the currently serving lane and any warm standby untouched; only after the build succeeds may the target lane be recreated with --no-build.
After a watcher-owned child deploy command fails, the watcher prunes failed-build residue before returning to the poll loop: it prunes the configured Buildx builder when BUILDX_BUILDER or DOCKER_WEB_BUILD_BUILDER_NAME is available, then runs docker image prune --force --filter dangling=true. Set DOCKER_WEB_WATCHER_PRUNE_FAILED_BUILD_RESIDUE=0 only when an operator needs to preserve failed build layers for debugging. When the child deploy failed with a BuildKit transport/resource signature, the watcher also runs the best-effort exec-cache cleanup and recreates the Compose-owned buildkit service so the next poll does not inherit the dead builder endpoint.
If tmp/docker-web/prod/active-color is missing or stale, the deployer and watcher recover the serving lane from the generated nginx proxy config before deciding what to rebuild. Treat the proxy config as the runtime source of truth during drift so a failed reconciliation cannot misclassify and clear a stable deployment.

Browser-State 502 Recovery

If some normal browsers still return Cloudflare 502 Host Error, 431, or Chrome ERR_INVALID_RESPONSE while incognito works, treat it as stale client state or an auth-cookie/header-size problem before assuming the tunnel itself is broken. Normal browser-state recovery should use the app route: GET is a no-store confirmation page only, and the destructive Clear-Site-Data response happens only after a same-origin POST. Oversized request headers are different: the request may never reach Next.js. The production web-proxy therefore gives web, Hive, and Meet 64 KB request header headroom, maps Nginx 431/494 oversized-header failures to a local browser-state recovery response, and the Docker web app server starts with --max-http-header-size=65536 so ordinary Supabase auth-cookie chunking has matching headroom after proxying. How to recognize each failure mode:

upstream sent too big header while reading response header from upstream means nginx response-header buffers were too small for the auth response.
web-green could not be resolved or web-blue could not be resolved means a stale nginx worker or keepalive connection still tried to reach a color that no longer existed. The current warm-standby model is designed to avoid that.
A browser that fails only in regular mode but works in incognito usually has stale Supabase auth cookies, stale service-worker state, or both.

Recovery path:

Send affected users to the recovery route on the affected origin: https://tuturuuu.com/~recover-browser-state for the main app, or https://hive.tuturuuu.com/~recover-browser-state for Hive.
That route is public and bypasses auth/onboarding middleware. When the request can reach the app, use the confirmation form so the cookie-clearing POST explicitly expires Supabase auth cookie variants for the current host.
If the browser is already sending too many stale cookies, the proxy catches the 431/494 before Next.js and returns Clear-Site-Data: "cache", "cookies", "storage", "executionContexts" while redirecting the browser back to /login?browserStateReset=1.

Operational signals:

Inspect X-Platform-Deployment-Stamp, X-Platform-Blue-Green-Primary, and X-Platform-Blue-Green-Color response headers to confirm which rollout is currently serving a request.
If the recovery URL fixes the issue for a user, the likely root cause was stale browser state rather than an active deploy outage.
If recovery does not help and proxy logs still show too big header, focus on auth redirect size or additional cookie bloat.

Operational notes:

This is intended for disposable deployment clones on a server, not for active developer worktrees or manual hotfix edits.
By default the watcher resets tracked changes, deletes untracked files and directories, fetches, and hard-resets to the locked upstream when the local checkout is behind, ahead, or diverged.
Set DOCKER_WEB_WATCHER_WORKTREE_RESET_DISABLED=1 before starting the watcher to preserve the old protective behavior: dirty worktrees block, ahead/diverged branches are skipped, and only fast-forward pulls are attempted.
The self-restart path only triggers when the watcher script itself changed in the fetched revision; normal app-code deploys keep the current watcher process alive.
During that self-restart path, nginx keeps the assigned proxy port up the whole time because the replacement watcher refreshes the existing proxy container in place before it starts the new build.
The watcher inherits the default blue/green build caps from bun serve:web:docker:bg, so the current defaults still apply during auto-deploys.
Deployment history is watcher-managed. Manual blue/green rollouts still show up in the live runtime status if the stack is active, but they do not backfill the watcher’s last-3 deployment list unless they were performed through the watcher itself.
Rollback pins are intended for bad latest deployments or failed reconciliation builds. Pin only a known successful deployment from the retained ledger, then remove the pin after main contains the corrective commit you want the watcher to resume deploying.

Validation And CI

docker-setup-check.yaml now validates all of the following:

node scripts/check-docker-web.js
node --test scripts/check-docker-web.test.js scripts/docker-web.test.js scripts/run-tanstack-e2e-docker.test.js
docker compose -f docker-compose.web.yml config
docker compose -f docker-compose.web.yml --profile redis config
docker compose -f docker-compose.web.yml --profile cloudflared config
docker compose -f docker-compose.web.prod.yml config
docker compose -f docker-compose.web.prod.yml --profile redis config
docker compose -f docker-compose.web.prod.yml --profile cloudflared config
docker compose -f docker-compose.tanstack-dual.yml config
docker buildx build --load --cache-from type=gha,scope=docker-backend --cache-to type=gha,scope=docker-backend,mode=max -f apps/backend/Dockerfile .
docker buildx build --load --target dev --cache-from type=gha,scope=docker-web-dev --cache-to type=gha,scope=docker-web-dev,mode=max -f apps/web/Dockerfile .
docker buildx build --load --target runner --secret id=web_env,src=apps/web/.env.local --cache-from type=gha,scope=docker-web-prod --cache-to type=gha,scope=docker-web-prod,mode=max -f apps/web/Dockerfile .
docker buildx build --load --target runner --cache-from type=gha,scope=docker-tanstack-web-prod --cache-to type=gha,scope=docker-tanstack-web-prod,mode=max -f apps/tanstack-web/Dockerfile .

That means Docker CI now covers the dev image, the Next.js production image, the TanStack production image, the dual-stack compose file, and the optional Cloudflare Tunnel profile rendering. For focused watcher-script checks, run the root Node test file directly, for example node --test --test-name-pattern "pullTrackedBranch" scripts/watch-blue-green-deploy.test.js. Running bun test scripts/... from the repo root invokes the package test script and can expand into the full Turbo test suite. When scripts/check-docker-web.js or similar root validators need to assert a literal Dockerfile template placeholder like ${process.env.PORT || 7803}, prefer a regex or another explicitly escaped matcher instead of a plain string literal. Biome treats raw ${...} text inside normal strings as lint/suspicious/noTemplateCurlyInString, which can break CI even when the runtime behavior is unchanged.

Operator Notes

Do not paste docker compose config output into chat or tickets; it expands env values.
If you need rebuild-before-restart on a server, use bun serve:web:docker:bg.
If the latest blue/green deployment is bad, use the infrastructure monitoring dashboard to pin a previous successful deployment before debugging forward.
If a blue/green deploy is interrupted, rerunning the same command from the intended commit is the normal recovery path.

Overview

Platform

Build

Learn

Reference

Files That Define The Stack

Supported Commands

Flags And Implicit Mappings

Runtime Requirements

Cloudflare Preview Path

Dockerized E2E

Coolify

Development Mode

Production Mode

In-Place

Blue/Green

Meet Realtime

Native Cron Runner

Auto-Deploy Watcher

Deployment build lock

Monitoring Surfaces

Build Resource Caps

Redis Profile

Cloudflare Tunnel Profile

Auto-Pull Blue/Green Watcher

Browser-State 502 Recovery

Validation And CI

Operator Notes

​Files That Define The Stack

​Supported Commands

​Flags And Implicit Mappings

​Runtime Requirements

​Cloudflare Preview Path

​Dockerized E2E

​Coolify

​Development Mode

​Production Mode

​In-Place

​Blue/Green

​Meet Realtime

​Native Cron Runner

​Auto-Deploy Watcher

​Deployment build lock

​Monitoring Surfaces

​Build Resource Caps

​Redis Profile

​Cloudflare Tunnel Profile

​Auto-Pull Blue/Green Watcher

​Browser-State 502 Recovery

​Validation And CI

​Operator Notes

Files That Define The Stack

Supported Commands

Flags And Implicit Mappings

Runtime Requirements

Cloudflare Preview Path

Dockerized E2E

Coolify

Development Mode

Production Mode

In-Place

Blue/Green

Meet Realtime

Native Cron Runner

Auto-Deploy Watcher

Deployment build lock

Monitoring Surfaces

Build Resource Caps

Redis Profile

Cloudflare Tunnel Profile

Auto-Pull Blue/Green Watcher

Browser-State 502 Recovery

Validation And CI

Operator Notes