Skip to main content
This is the operational guide for the Docker-based apps/web runtime.

Files That Define The Stack

  • apps/web/Dockerfile
  • apps/web/docker/blue-green-watcher.Dockerfile
  • apps/web/docker/cron-runner.Dockerfile
  • apps/web/cron.config.json
  • apps/backend/Dockerfile
  • apps/meet-realtime/Dockerfile
  • apps/supermemory/Dockerfile
  • docker-compose.web.yml
  • docker-compose.web.prod.yml (Compose include entry that merges docker-compose/compose.web.prod.*.yml fragments plus shared secrets / volumes)
  • scripts/sync-web-crons.js
  • scripts/watch-web-crons.js
  • scripts/docker-web.js
  • scripts/check-docker-web.js

Supported Commands

CommandPurpose
bun dev:web:dockerRun the web dev workflow inside Docker
bun devx:web:dockerExplicitly start local Supabase, then the Docker dev workflow
bun devrs:web:dockerExplicitly start and reset local Supabase, then the Docker dev workflow
bun dev:web:docker:downStop the Docker dev workflow
bun serve:web:dockerBuild and run the production web image in-place
bun serve:web:docker:bgBlue/green production deploy with health-checked cutover
bun serve:web:docker:bg:watchRecreate the watcher container, then tail its live logs while it polls the tracked branch and auto-runs blue/green after a successful fast-forward pull
bun serve:web:docker:downStop the production Docker stack
bun serve:web:docker:bg:downStop the blue/green stack and clear local runtime state
bun test:e2e / bun test:e2e:web:dockerStart local Supabase, reset it, run the production blue/green Docker web stack, then run Playwright
bun check:dockerValidate Dockerfile and compose parity rules

Flags And Implicit Mappings

| Flag | Meaning | | ------------------------------- | ------------------------------------------------------------------------------------------- | ------------ | --------------------------------------------- | | --without-redis | Disable the bundled Redis profile and skip Docker-injected Redis env | | --with-cloudflared | Enable the bundled Cloudflare Tunnel container profile | | --with-supabase | Start local Supabase before the Docker web flow | | --reset-supabase | Start and reset local Supabase before the Docker web flow | | --env-file tmp/e2e/web.env | Use an explicit Docker web env file for build secrets and runtime env files | | --mode prod | Use the production compose file instead of the dev stack | | --strategy blue-green | Use blue/green production deployment instead of in-place replacement | | --profile redis | Explicitly enable the Redis profile when calling the helper directly | | --profile cloudflared | Explicitly enable the Cloudflare Tunnel profile when calling the helper directly | | --build-memory 4g | Run builds through a capped Buildx builder with a memory ceiling | | --build-cpus 4 | Run builds through a capped Buildx builder with an approximate CPU limit | | --build-max-parallelism 2 | Limit concurrent BuildKit solve steps for lower build pressure | | --build-builder-name tuturuuu | Override the throttled Buildx builder name | | --resume-if-running | If another watcher PID already holds the lock, mirror its live dashboard instead of failing | | --replace-existing | If another watcher PID already holds the lock, stop it and take over | | --if-locked <fail | resume | replace> | Explicit lock-conflict policy for the watcher |
CommandImplicit flags
bun dev:web:dockernone
bun devx:web:docker--with-supabase
bun devrs:web:docker--reset-supabase
bun serve:web:docker--mode prod
bun serve:web:docker:bg--mode prod --strategy blue-green
bun dev:web:docker -- --without-redis--without-redis

Runtime Requirements

  • .env.local should be the primary Docker env file. The helper still falls back to apps/web/.env.local for older hosts that have not moved their env yet.
  • When --env-file is provided, the Docker helper uses that file for the Dockerfile secret and for the Compose runtime env_file entries. This keeps special-purpose runs such as E2E from accidentally inheriting a developer’s cloud Supabase .env.local.
  • Production Compose fragments live under docker-compose/, so their relative host paths must be written from that directory. Use .. to reach the repo root for build contexts, env files, and bind mounts; otherwise Docker Compose resolves paths like apps/... as docker-compose/apps/... and the watcher image fails before the deployment loop starts.
  • Docker BuildKit must be available. The helper sets COMPOSE_DOCKER_CLI_BUILD=1, DOCKER_BUILDKIT=1, and BUILDX_NO_DEFAULT_ATTESTATIONS=1 so local blue/green image exports do not stall while resolving default provenance metadata.
  • The dependency stages in apps/web/Dockerfile, apps/hive/Dockerfile, and apps/hive-realtime/Dockerfile must copy every apps/*/package.json and packages/*/package.json manifest before running any frozen Bun install. Adding a new workspace app or package without updating all three lists makes Docker-only installs try to rewrite bun.lock. bun check:docker validates this manifest parity. apps/backend is an independent Go module, and apps/meet-realtime is intentionally not a workspace package; their Dockerfiles do not require bun.lock changes when service source changes.
  • The Docker web flow does not start local Supabase unless you explicitly choose bun devx:web:docker or bun devrs:web:docker.
  • By default the Docker container uses the Supabase URL already configured in .env.local, falling back to apps/web/.env.local. It should stay pointed at the cloud project for normal Tuturuuu work.
  • If that configured URL explicitly points at host-run local Supabase, the helper rewrites the server-side Supabase URL to host.docker.internal while leaving NEXT_PUBLIC_SUPABASE_URL alone for browsers.
  • Dockerized web services set __NEXT_PRIVATE_ORIGIN from DOCKER_WEB_NEXT_PRIVATE_ORIGIN, defaulting to http://127.0.0.1:7803. This keeps Next.js Server Action forwarding on the in-container web listener even when nginx preserves an external Host. If logs show failed to forward action response or UND_ERR_HEADERS_TIMEOUT, verify the running web container has __NEXT_PRIVATE_ORIGIN=http://127.0.0.1:7803 or an intentional internal override. Do not use serverActions.allowedOrigins as the primary fix for this symptom; that setting controls Server Action origin/host validation, not the forwarded-action fetch URL.
  • If logs still show Error checking if workspace is personal with [locale], verify the running image includes commit b30d7e2b07 or newer plus the shared @tuturuuu/utils UUID guard.
  • Dockerized web commands auto-enable the local Redis companion stack and inject UPSTASH_REDIS_REST_URL plus a generated UPSTASH_REDIS_REST_TOKEN into the web container.
  • Dockerized production commands generate BACKEND_INTERNAL_TOKEN when one is not provided and inject BACKEND_INTERNAL_URL=http://backend:7820 for the Go backend service. Dev Compose uses the same internal URL with a local fallback token.
  • The production stack runs the first-party AI memory sidecar as an internal support service at http://supermemory:8787. The service name and SUPERMEMORY_* env names stay compatible with existing web runtime wiring, but apps/supermemory/Dockerfile builds Tuturuuu-owned pgvector memory code.
  • Dockerized production commands auto-configure the memory sidecar unless explicitly disabled. scripts/docker-web/env.js generates and persists the internal SUPERMEMORY_API_KEY, SUPERMEMORY_POSTGRES_PASSWORD, and SUPERMEMORY_DATABASE_URL, and defaults SUPERMEMORY_ENABLED=true, SUPERMEMORY_FAIL_OPEN=true, and SUPERMEMORY_TIMEOUT_MS=1500.
  • Operators can override generated values with DOCKER_SUPERMEMORY_API_KEY, DOCKER_SUPERMEMORY_POSTGRES_PASSWORD, DOCKER_SUPERMEMORY_DATABASE_URL, or DOCKER_SUPERMEMORY_ENABLED; standard SUPERMEMORY_* env still works.
  • Blue/green promotion health-gates supermemory with the rest of the support services. Changing apps/supermemory/, the production Compose fragments, or the Docker bake file refreshes the support service set. Explicit SUPERMEMORY_ENABLED=false or DOCKER_SUPERMEMORY_ENABLED=false removes that support service from blue/green builds, starts, and health gates for local-only runs.

Dockerized E2E

bun test:e2e from the repo root and bun test:e2e in apps/web run through scripts/run-web-e2e-docker.js instead of starting next dev. The runner:
  1. writes tmp/e2e/web.env with local-only Supabase, local app-origin variables, app-session JWT values, and a local-only E2E auth bypass for Turnstile/dev-session,
  2. starts and resets the Dockerized local Supabase stack,
  3. boots apps/web through the production blue/green Docker flow,
  4. starts Portless on unprivileged HTTPS port 1355 and registers the https://tuturuuu.localhost:1355 route only after the direct Docker proxy is healthy,
  5. waits for https://tuturuuu.localhost:1355/login, then runs Playwright against that shared-cookie origin, and
  6. tears down Docker web plus local Supabase unless E2E_KEEP_DOCKER_STACK=1.
Normal teardown passes --volumes --rmi local to Docker Compose and then removes custom image tags for the current ttr-e2e-* project, so per-run containers, Compose volumes, and baked blue/green images do not accumulate. E2E also sets DOCKER_WEB_BUILDKIT_PRUNE_AFTER_BUILD=1 by default because the per-run BuildKit cache/state is disposable; set E2E_DOCKER_BUILDKIT_PRUNE_AFTER_BUILD=0 only when debugging a local E2E build and you intentionally want to keep BuildKit cache. Local E2E also starts Supabase with edge-runtime excluded by default through DOCKER_WEB_SUPABASE_START_EXCLUDE=edge-runtime. The platform E2E suite does not serve local Edge Functions, and excluding that service keeps local runs from failing when the Supabase Edge Runtime tries to resolve external JSR packages. Set E2E_SUPABASE_START_EXCLUDE= when you intentionally need a full local Supabase stack for debugging. Local E2E also pins DOCKER_SUPERMEMORY_ENABLED=false and SUPERMEMORY_ENABLED=false; the memory integration is not under test there. E2E build caps default to auto for memory, CPU, and BuildKit max parallelism. The runner reads Docker’s current MemTotal before booting the stack, forwards that value as DOCKER_WEB_DOCKER_MEMORY_LIMIT, and resolves the BuildKit memory cap just under the active Docker Desktop allocation. On allocations below 10 GB, the E2E runner keeps the inner Next build at one CPU, static generation concurrency one, and a 4 GB Node heap so BuildKit keeps enough container headroom. The Next build engine remains Turbopack. Do not switch local E2E runs to the Webpack build path; the production and local Docker build paths are expected to exercise the same Turbopack compiler. For local machines where Docker Desktop still cannot allocate enough memory for the Turbopack image build, set DOCKER_WEB_NATIVE_BUILD=1 when running E2E. The blue/green build helper then runs bun run build:web:docker on the host with DOCKER_WEB_STANDALONE=1, packages apps/web/.next/standalone plus static assets into the same Node runtime image shape, and continues with Docker Compose startup and Playwright. Native builds default their host-side build memory budget to 12 GB instead of Docker Desktop’s memory cap; set DOCKER_WEB_NATIVE_BUILD_MEMORY=16g or another explicit value when the host needs a different Node heap bucket. Keep this as a local/debug escape hatch; CI and production watchers should keep building the web image entirely inside Docker. On GitHub-hosted runners, the E2E workflow frees disk before restoring or loading cached Supabase Docker images. Keep that cleanup ahead of the cache load: running docker system prune -af --volumes after cached images are loaded would remove the images the shard is about to use, while skipping the cleanup can leave too little space for the web Docker image dependency layer. When an E2E shard fails, the runner prints diagnostics before teardown while the containers still exist. The job log includes the primary error, blue/green stage state, the Playwright .last-run.json file when available, Docker containers for the shard Compose project, production Compose status, recent logs for web, Hive, proxy, and support services, the Portless route list, a probe against the configured E2E BASE_URL, and bun sb:status. The workflow also has a failure-only diagnostic step after Run Playwright shard as a backstop, so the job output should show the failing service or stage even when Playwright report artifacts are incomplete. The workflow uploads diagnostics, Playwright reports, and apps/web/test-results for every non-cancelled shard so traces and screenshots stay available when the job output is too short. The Playwright global setup refuses non-local web origins and refuses Supabase origins outside localhost, 127.0.0.1, or host.docker.internal on port 8001. CI shards E2E with --shard=x/4; each shard gets its own Compose project name, but all shards still use ephemeral local Supabase rather than any cloud Supabase project. Because the Docker web app runs with NODE_ENV=production, the generated env file and Playwright process env also pin WEB_APP_URL, NEXT_PUBLIC_WEB_APP_URL, and NEXT_PUBLIC_APP_URL to the local shared-cookie origin; otherwise central-auth redirects can escape to the real tuturuuu.com origin during setup. The auth bypass is guarded by the local E2E web origin, the incoming request Host / forwarded host / Origin headers, and both the public and server-side Supabase origins before server-side auth code honors it, so it must not be used as a general production configuration. The blue/green proxy and apps/web runtime both allow 64 KB request headers. That headroom lets the browser reach /~recover-browser-state or the normal login flow when duplicated Supabase cookies make the default header limit too small. If a request is still too large for the proxy to forward, nginx handles 431/494 directly with Clear-Site-Data and redirects to /login?browserStateReset=1; this recovery must stay in the proxy because Next.js middleware cannot run after nginx rejects the header. The blue/green nginx proxy must forward the original Host header with its port intact via $http_host. Local E2E auth setup posts to http://localhost:7803/api/auth/dev-session, and the production-mode app accepts the setup route only when the public request origin stays local. The guard also tolerates production standalone/proxy normalization where request.url or Host becomes an internal Docker web upstream, but only when the forwarded public host is still the local E2E origin.

Coolify

Coolify can provide enough default deployment metadata for Tuturuuu’s Dockerfile setup to derive the app origin even when you do not manually define the usual app URL variables.
  • During Dockerfile builds, scripts/build-web-docker.js now derives missing WEB_APP_URL, NEXT_PUBLIC_WEB_APP_URL, and NEXT_PUBLIC_APP_URL values from Coolify’s COOLIFY_URL or COOLIFY_FQDN defaults before running bun run build:web.
  • During production container startup, apps/web/docker/prod-entrypoint.js applies the same Coolify fallback so server-side runtime code sees the same derived values.
  • The runtime URL resolvers used by the web proxy, internal API client, and drive export/auto-extract flows also fall back to COOLIFY_URL and COOLIFY_FQDN.
Recommended setup in Coolify:
  • Still set explicit Tuturuuu env like NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY, SUPABASE_SECRET_KEY, and any email or storage secrets yourself.
  • You can omit WEB_APP_URL, NEXT_PUBLIC_WEB_APP_URL, and NEXT_PUBLIC_APP_URL if Coolify already injects COOLIFY_URL or COOLIFY_FQDN for the deployment.
  • If you need one specific canonical domain while Coolify exposes multiple domains, set the Tuturuuu app URL variables explicitly instead of relying on the automatic fallback.

Development Mode

Development mode exists to preserve the normal root script contract while moving the web runtime into containers.
  • Container-managed node_modules are isolated from the host.
  • Package-local node_modules and dist directories are also isolated so host installs do not shadow container artifacts.
  • The root Docker context excludes generated app artifacts such as .next, .turbo, coverage output, and Flutter build directories. Keep these excludes intact so production builds do not stream multi-gigabyte local artifacts into BuildKit.
  • A host bun install is not required just to boot the Dockerized web stack.

Production Mode

The production compose file uses the runner target from apps/web/Dockerfile.

In-Place

bun serve:web:docker
Use this when a short restart is acceptable.

Blue/Green

bun serve:web:docker:bg
Blue/green deploy does this:
  1. Reads the last active color from tmp/docker-web/prod/active-color.
  2. Ignores that state if the corresponding container no longer exists.
  3. Builds the target web image through Docker Buildx Bake using Compose-derived targets, then stops/removes only the old target web lane and starts the fresh replacement.
  4. Starts the target web lane after its healthcheck passes and records web-promote as a staged target in tmp/docker-web/prod/target-state.json. The deploy does not reload web-proxy or write tmp/docker-web/prod/active-color yet.
  5. Builds and runs Hive separately. hive-db-migrate, the target hive-blue/hive-green service, hive-realtime, and the Hive proxy check must pass before web can be publicly promoted. A migration or Hive health failure marks the Hive stage failed, leaves active-color on the previous web lane, and keeps the staged target web lane out of public routing.
  6. Refreshes support services (backend, meet-realtime, markitdown, storage-unzip-proxy, and web-cron-runner) after web/Hive target work. A support build or health failure also blocks web-proxy reload and leaves the previous active web lane serving. Their build step is scoped: ordinary web commits build only web-blue or web-green, while Hive and helper images rebuild only when their source, Dockerfile, compose wiring, or shared dependency inputs changed. Image-only services such as redis, serverless-redis-http, web-proxy, and cloudflared are never passed to Bake.
  7. Injects Docker-internal helper URLs into apps/web: BACKEND_INTERNAL_URL=http://backend:7820, MARKITDOWN_ENDPOINT_URL=http://markitdown:8000/markitdown, DISCORD_APP_DEPLOYMENT_URL=http://markitdown:8000, DRIVE_AUTO_EXTRACT_PROXY_URL=http://storage-unzip-proxy:8788/extract, VALSEA_PRONUNCIATION_ASSESSOR_URL=http://pronunciation-assessor:8010/assess, and INTERNAL_WEB_API_ORIGIN=http://web-proxy:7803.
  8. Keeps the stable web-proxy container running in place during ordinary promotions instead of re-running compose up against the public :7803 listener. If the running proxy is missing required host ports or its container image no longer matches the resolved Compose image, the deploy defers the forced proxy recreate until after the target web, Hive, and support gates have passed.
  9. Validates the generated nginx config with nginx -t, then reloads or recreates the proxy only after every staging gate has passed.
  10. Immediately verifies the proxy can serve the internal /__platform/drain-status endpoint through the newly routed color before writing active-color and marking the staged web target healthy. This avoids false deployment failures from public API middleware or rate limits.
  11. Polls an internal drain-status endpoint on the old color and waits until it has no in-flight HTTP work left before demoting it to standby. This keeps long-running server actions, route handlers, and other open requests from being cut off mid-flight.
  12. Falls back to the short fixed drain window only when the old image predates the drain-status endpoint and cannot report its active requests yet.
  13. Keeps the demoted color online as a warm nginx backup target instead of removing it immediately, so stale keepalive workers and Cloudflare Tunnel connections can still fail over cleanly during the post-promotion window.
  14. If the demoted standby color is still on the previous revision after 15 minutes, the watcher automatically rebuilds that stale standby in place so both colors converge on the latest checked-out code without flipping the active port or promoting traffic again.
During blue/green deploys, the watcher supplies the version badge metadata via PLATFORM_BUILD_* variables for both Docker image builds and runtime containers. It infers PLATFORM_BUILD_COMMIT_HASH, PLATFORM_BUILD_COMMIT_SHORT_HASH, PLATFORM_BUILD_COMMIT_MESSAGE, PLATFORM_BUILD_REF_NAME, PLATFORM_BUILD_ENVIRONMENT, PLATFORM_BUILD_BUILT_AT, PLATFORM_BUILD_DEPLOYMENT_URL, and PLATFORM_BUILD_DEPLOYMENT_STAMP from the current checkout plus the deployment context. The account-gated badge reads those runtime values before falling back to generated Vercel/GitHub metadata, so on-prem watcher deployments show the served commit instead of local / Unknown. If those PLATFORM_BUILD_* values are missing or blank in a self-hosted runtime, apps/web falls back to the mounted blue/green snapshot before using generated/local defaults. The resolver reads only lightweight snapshot files: prod/target-state.json, prod/active-color, prod/deployment-stamp, watch/blue-green-auto-deploy.status.json, and watch/blue-green-auto-deploy.history.json under PLATFORM_BLUE_GREEN_MONITORING_DIR, with local tmp/docker-web candidates for development. Selection prefers targets.web for the active color, then an active deployment row, then the latest successful row for the active color, then the latest successful row overall. commitSubject becomes the badge commit message, millisecond deployment timestamps become ISO deployment times, and the runtime deployment stamp file supplies the displayed deployment stamp. The resolver does not invent deployment URL, ref, or environment from color or commit data alone. The helper writes support-image input hashes to tmp/docker-web/prod/build-input-hashes.json and keeps recent decisions in tmp/docker-web/prod/build-input-hashes.history.json. Infrastructure monitoring reads that history so deployment rows can show which helper images were rebuilt and which ones were served from the cached build inputs.

Meet Realtime

apps/meet-realtime is the internal control-plane service for Meet calls, webinars, and low-latency broadcast coordination. It is a Bun WebSocket service started by production Compose as meet-realtime on container port 7816. web-proxy exposes /realtime for tumeet.me and meet.tuturuuu.com and forwards WebSocket upgrades to that service. Production meeting logic stays on Tuturuuu infrastructure:
  • apps/web owns protected meeting APIs, verifies workspace access, and mints short-lived MEET_REALTIME_TOKEN_SECRET join tokens.
  • apps/web and apps/meet-realtime must share the same MEET_REALTIME_TOKEN_SECRET. Production Compose exposes MEET_REALTIME_URL and NEXT_PUBLIC_MEET_REALTIME_URL to Web, defaulting both to wss://meet.tuturuuu.com/realtime for browser join-token payloads.
  • Browsers connect to wss://meet.tuturuuu.com/realtime?token=....
  • apps/meet-realtime validates the token, manages ephemeral room presence, chat, stage state, and reconnect resync, then calls Cloudflare Realtime SFU APIs with server-only CLOUDFLARE_REALTIME_APP_ID and CLOUDFLARE_REALTIME_APP_SECRET.
  • Browser media flows to Cloudflare Realtime SFU. The control WebSocket can reconnect during watcher-managed service refreshes without creating a new meeting record.
  • Broadcast streaming stays API-owned by apps/web: the meeting host calls /api/v1/workspaces/:wsId/meetings/:meetingId/stream, apps/web creates or resumes a Cloudflare Stream live input with server-only CLOUDFLARE_ACCOUNT_ID plus CLOUDFLARE_STREAM_API_TOKEN (or CLOUDFLARE_API_TOKEN), stores the live input UID and WHIP/WHEP URLs in private.meet_stream_live_inputs, and returns the WHIP publish URL only to the host response. Workspace viewers receive only the WHEP playback URL.
Cost controls are part of the signed token contract: camera defaults off, video is capped at 720p/24fps, room limits are explicit, and webinar viewers do not receive publish scope. Cloudflare Stream live inputs are created with recording mode off and hidden viewer counts by default; set CLOUDFLARE_STREAM_ALLOWED_ORIGINS to a comma-separated allowlist when Stream playback should be origin-restricted. Do not add Cloudflare Workers or Durable Objects for production Meet room logic; use the internal service and blue/green watcher instead. scripts/docker-web/env.js persists generated helper tokens under tmp/docker-web/markitdown-token, tmp/docker-web/storage-unzip-token, tmp/docker-web/supermemory-api-key, and tmp/docker-web/supermemory-postgres-password. Override them with DOCKER_MARKITDOWN_ENDPOINT_SECRET, DOCKER_DRIVE_UNZIP_PROXY_SHARED_TOKEN, or the DOCKER_SUPERMEMORY_* env when an operator needs fixed values. Workspace ZIP auto-extract is enabled by the workspace-level DRIVE_AUTO_EXTRACT_ZIP secret. Workspaces with EXTERNAL_PROJECT_ENABLED=true also opt in automatically so CMS/WebGL workspaces can reuse the unzipper without duplicating storage automation setup. The Docker-internal URL and token are fallbacks for workspaces that have not supplied custom proxy secrets. If a workspace supplies a custom DRIVE_AUTO_EXTRACT_PROXY_URL, it must also supply its own DRIVE_AUTO_EXTRACT_PROXY_TOKEN; the process-wide fallback token must not be sent to a workspace-controlled proxy URL. The pronunciation-assessor helper is local-only. It defaults to PRONUNCIATION_ASSESSOR_DEFAULT_MODEL=local-whisper-large-v3-turbo, can switch between local Whisper sizes and local-wav2vec2, and stores downloaded Transformers checkpoints in the platform-pronunciation-assessor-cache volume. The default model preloads at container startup. Set PRONUNCIATION_ASSESSOR_PRELOAD=false only when startup speed matters more than first-request latency. On shared hosts, keep PRONUNCIATION_ASSESSOR_MAX_LOADED_MODELS=1 and tune PRONUNCIATION_ASSESSOR_IDLE_TTL_SECONDS so idle models are unloaded before they hold GPU or RAM capacity unnecessarily. The helper rejects uploads over PRONUNCIATION_ASSESSOR_MAX_UPLOAD_BYTES (default: 10 MB) and rejects decoded audio longer than PRONUNCIATION_ASSESSOR_MAX_AUDIO_SECONDS (default: 120 seconds) before local model inference. Leave PRONUNCIATION_ASSESSOR_ADMIN_TOKEN unset unless operators need to call POST /models/load or POST /models/unload; those model-control endpoints are disabled without the token and require Authorization: Bearer <token> when it is configured. CMS WebGL package uploads also use the storage-unzip-proxy, but they are a first-class CMS upload path rather than generic Drive automation. They require a configured unzip proxy URL and token, but they do not require the DRIVE_AUTO_EXTRACT_ZIP workspace opt-in secret. The CMS finalize route unpacks the ZIP into workspace Drive, detects the playable index.html, and stores the same-origin artifact map on the CMS webgl-package asset. Browser uploads go directly to the signed storage URL returned by the self-hosted web app’s WebGL upload-url route, so large ZIPs do not pass through the Vercel-hosted CMS app or the web app proxy before reaching Supabase Storage or R2. The CMS client reports per-file upload progress during the signed upload, then calls the WebGL finalize route so the backend handles extraction and artifact-map persistence. The unzip proxy fans out backend callbacks for extracted folders and asks the callback route for per-file upload URLs. Before uploading extracted bytes, the proxy verifies the callback response names a trusted provider and that the signed upload URL belongs to hosted Supabase, Cloudflare R2, or an exact operator-configured upload origin. It forwards only content type and generated bearer-token headers to the upload URL. The storage auto-extract and CMS WebGL extract callback routes still pass through the central API proxy guard before they validate the shared unzip token, so malformed, rate-limited, or oversized callback requests are rejected at the same cheap boundary as other API mutations. Direct file callbacks are legacy/small-file only and enforce the same 512 KiB body budget locally; large extracted files must use the file-upload-url callback flow. The proxy currently buffers the downloaded archive and each extracted file in memory, so the default caps stay conservative: 100 MiB ZIP downloads, 50 MiB per extracted file, and 250 MiB total extracted output. Operators can tune those caps with DRIVE_UNZIP_PROXY_MAX_ARCHIVE_BYTES, DRIVE_UNZIP_PROXY_MAX_ENTRY_BYTES, DRIVE_UNZIP_PROXY_MAX_TOTAL_EXTRACTED_BYTES, and DRIVE_UNZIP_PROXY_MAX_ARCHIVE_ENTRIES; workspace Drive quota must still be large enough for the uploaded archive and extracted files. Set DRIVE_UNZIP_PROXY_ALLOWED_UPLOAD_ORIGINS for self-hosted Supabase or custom R2/S3-compatible origins, and reserve DRIVE_UNZIP_PROXY_ALLOW_LOCAL_UPLOAD_ORIGINS=true for local Supabase testing. The MarkItDown endpoint is the conversion path for uploaded workspace files. Do not route YouTube summaries through MarkItDown or Google Search. Google Gemini chat requests attach one public or unlisted YouTube URL directly as a native video/mp4 file input, so the model can summarize the video through the provider-supported video path. Playlist/query parameters are stripped before the URL is attached so each request references only one video. Any legacy direct URL conversion path that still reaches MarkItDown must reserve and commit the fixed MarkItDown credit charge before the sidecar request is sent. Interrupted Docker Compose recreates can leave temporary container names such as <hex>_platform-markitdown-1. The Docker helper treats those as recoverable only when the suffix matches one of the services in the current compose up request, removes that stale temp container, and retries the same narrow up operation. The production web-proxy service is pinned to the official mainline Alpine image nginx:1.31.0-alpine, and scripts/check-docker-web.js verifies that pin in the merged production Compose config. The long-lived nginx proxy also raises its request-header buffer limits so larger session/auth cookies do not fail at the proxy layer with 400 Request Header Or Cookie Too Large before the active web container sees the request. It now also raises its upstream response-header buffers (proxy_buffer_size, proxy_buffers, and proxy_busy_buffers_size) so larger Supabase auth responses with multiple Set-Cookie headers do not fail with upstream sent too big header while reading response header from upstream. The proxy uses Docker DNS re-resolution plus a shorter keepalive timeout so promotions are less likely to produce transient 502 Host Error responses for existing Cloudflare Tunnel connections, while the previous color remains alive as a warm standby. The proxy keeps both blue and green in the nginx upstream group during steady state, with the active color as the primary upstream and the standby color as a backup. The runtime DNS resolver is defined at the nginx include/http scope, not just inside server, so Docker service-name resolution continues to work for the blue/green upstream block at reload time. Both the production web image healthcheck and the web-proxy compose healthcheck now use the internal /__platform/drain-status endpoint too, so raw bun serve:web:docker:bg waits on the same non-rate-limited readiness path as the blue/green promotion gate. The proxy exposes that path as an exact loopback-only nginx location and forwards a private internal probe header to the active web lane, because the web request tracker intentionally answers the drain-status endpoint only for local or explicitly trusted Docker-network requests. Every blue/green deployment also stamps the runtime with PLATFORM_DEPLOYMENT_STAMP and PLATFORM_BLUE_GREEN_COLOR. Those values are surfaced through both nginx response headers and the web process itself, and the web layout appends the deployment stamp to the service-worker URL with updateViaCache: 'none' so new deployments push browsers toward the latest worker instead of lingering on stale cached state. The local runtime state lives in:
  • tmp/docker-web/prod/active-color
  • tmp/docker-web/prod/deployment-stamp
  • tmp/docker-web/prod/nginx.conf
  • tmp/docker-web/prod/target-state.json
These files are intentionally local-only and safe to regenerate. Infrastructure Monitoring → Deployments reads target-state.json, the watcher deployment history, and the latest deployment stage handoff together, so operators can see staged target work such as a prepared web color while Hive or support gates still block public promotion. active-color and the generated proxy config remain on the previous serving web lane until the final proxy-reload stage passes. Watcher-managed deployments persist the web-build, web-promote, hive-migrate, hive-promote, support-refresh, and proxy-reload stage results into deployment history. Modern rows that were recorded without a stage array are inferred from final deployment status and build-cache metadata; truly pre-tracking rows still show stage chips as not applicable. Active watcher deployments that only have pending build/deploy status are surfaced with a synthetic current stage so operators can see the build is in progress before full stage history is written. After each Hive migration pass, the deploy helper runs docker compose rm --stop -f hive-db-migrate so the completed one-shot migration service is stopped if necessary and removed. This keeps hive-db-migrate from lingering after depends_on starts it while Hive services come up.

Native Cron Runner

Self-hosted production cron jobs use apps/web/cron.config.json as the shared source of truth. apps/web/vercel.json.crons should stay generated from that file with:
node scripts/sync-web-crons.js --check
node scripts/sync-web-crons.js
Use --check in CI and local verification when cron definitions change. The sync script preserves Vercel behavior by copying each enabled job’s path and schedule from the shared config into apps/web/vercel.json. In Docker production, the web-cron-runner service runs scripts/watch-web-crons.js against INTERNAL_WEB_API_ORIGIN, defaulting to http://web-proxy:7803. Requests include Authorization: Bearer ${CRON_SECRET || VERCEL_CRON_SECRET} so the same route auth gate can protect Vercel and native Docker executions. When neither CRON_SECRET nor VERCEL_CRON_SECRET is set on the host, the Docker environment generator creates a persisted internal secret at tmp/docker-web/cron-token and injects it into both the web containers and the web-cron-runner service as CRON_SECRET. This keeps native Docker cron auth self-contained while preserving explicit host-provided secrets when present. Runtime cron telemetry is intentionally file-based and local to the host:
  • tmp/docker-web/cron/status.json for runner health, the current cycle, and the latest manual run lifecycle records. Manual runs move through queued, processing, and a final success / failed / timeout / skipped state. While a run is processing, the runner refreshes captured route console logs in this file so the monitoring UI can show near-realtime status and log updates.
  • tmp/docker-web/cron/state.json for restart-safe last-run markers.
  • tmp/docker-web/cron/executions/*.jsonl for per-run route response, duration, status, and captured web-container console logs.
  • tmp/docker-web/watch/control/cron-control.json for the global enabled switch.
  • tmp/docker-web/watch/control/cron-run-requests/*.json for queued manual runs created by the monitoring UI.
Disabling cron execution blocks scheduled jobs and leaves manual run requests queued until cron execution is enabled again. The runner also supports --once, which is used by script tests to verify due-run detection, queued manual runs, restart-safe state, and log persistence without starting the long-running loop. Calendar cron routes should call workspace calendar APIs through INTERNAL_WEB_API_ORIGIN when it is present. In Docker production this keeps provider sync and smart scheduling traffic on the internal web-proxy origin instead of accidentally depending on a public app URL from inside the container. calendar-provider-sync intentionally calls the same /api/v1/workspaces/:wsId/calendar/sync route that the calendar page uses, but with cron auth and source: "cron" so dashboard runs stay manual and scheduled runs stay auditable. The workspace sync route is responsible for provider fan-out: Google connections are selected from active Google auth-token rows, and Microsoft connections are selected from active Microsoft auth-token rows. Do not reimplement provider-specific calendar fetching inside the cron wrapper.

Auto-Deploy Watcher

bun serve:web:docker:bg:watch locks the current branch/upstream at startup, polls every second, fast-forwards when GitHub has a newer commit, and runs the blue/green deploy flow automatically. Run this command from a host-level process manager, not only from inside Docker. The command starts and tails the web-blue-green-watcher container, but the host process is the part that can recover after the Docker engine itself dies. When Docker is unavailable, the host supervisor polls docker info; after DOCKER_WEB_WATCHER_DOCKER_RESTART_AFTER_MS milliseconds of continuous failure (default 30000), it attempts to restart Docker, waits for docker info to pass, runs any configured host-level post-restart commands, then recreates the watcher container. The recreated watcher reuses the existing cached blue/green recovery path to bring web-proxy and the active/standby web lanes back to health. Docker restart command defaults:
  • Linux: systemctl restart docker
  • macOS: open -ga Docker
  • Windows: powershell.exe -NoProfile -Command Start-Process "Docker Desktop"
Override the command with a JSON array when the host needs a different service manager or a narrow sudo rule:
DOCKER_WEB_WATCHER_DOCKER_RESTART_COMMAND='["sudo","systemctl","restart","docker"]' \
  bun serve:web:docker:bg:watch -- --if-locked replace
Useful host-supervisor knobs:
  • DOCKER_WEB_WATCHER_DOCKER_RESTART_AFTER_MS: delay before the first Docker restart attempt while docker info is failing; set 0 to disable attempts.
  • DOCKER_WEB_WATCHER_DOCKER_RESTART_COOLDOWN_MS: minimum time between restart attempts; default 300000.
  • DOCKER_WEB_WATCHER_DOCKER_RESTART_COMMAND: command used to restart or open Docker. Prefer JSON array syntax for commands with quoted arguments.
  • DOCKER_WEB_WATCHER_DOCKER_RESTART_DISABLED=1: hard-disable daemon restart attempts while still waiting for Docker to recover externally.
  • DOCKER_WEB_WATCHER_DOCKER_RECOVERY_TIMEOUT_MS: optional maximum wait time for Docker recovery; unset or 0 means wait indefinitely.
  • DOCKER_WEB_WATCHER_DOCKER_POST_RESTART_COMMAND_TIMEOUT_MS: timeout for each additional host-level recovery command; default 600000.
  • DOCKER_WEB_WATCHER_DOCKER_POST_RESTART_COMMANDS: JSON array of host-level commands to run after Docker is reachable again and before Tuturuuu recreates its watcher container. Each entry is an object with command, args, and an optional cwd.
  • DOCKER_WEB_WATCHER_MAX_REQUEST_LOG_BYTES: maximum durable proxy request-log ledger size before the watcher rotates and prunes older JSONL chunks before appending new entries; default 268435456 bytes.
Timing, disable, and email alert values can be updated from Infrastructure Monitoring in the web dashboard. The dashboard writes tmp/docker-web/watch/control/blue-green-docker-recovery-settings.json, and the host supervisor reads that file before each Docker recovery wait. Dashboard settings override the environment defaults without restarting the supervisor. Host-level executable commands are different: configure DOCKER_WEB_WATCHER_DOCKER_RESTART_COMMAND and DOCKER_WEB_WATCHER_DOCKER_POST_RESTART_COMMANDS only in the host supervisor environment. The supervisor intentionally ignores command fields from blue-green-docker-recovery-settings.json so dashboard viewers cannot persist host commands for a later recovery event. That settings file also owns Docker crash email alerts:
  • emailAlertsEnabled: enables SES-backed Docker recovery alert emails from the web cron worker.
  • emailAlertRecipients: explicit recipient list. If this is empty, the cron falls back to PLATFORM_DOCKER_RECOVERY_ALERT_EMAILS, then the last operator email that saved the settings.
  • emailAlertCooldownMs: minimum time between alert emails; default 1800000.
The host supervisor persists Docker crash/recovery events to the watcher log archive as soon as it detects docker info failures. Because the web app itself usually runs inside Docker, SES email delivery happens after Docker is reachable again and the web cron job /api/cron/infrastructure/docker-recovery-alerts can run. The cron deduplicates by Docker recovery incident id using tmp/docker-web/watch/control/blue-green-docker-recovery-alert-state.json. Example post-restart commands for colocated projects:
[
  {
    "command": "docker",
    "args": ["compose", "-f", "/srv/zeus/docker-compose.yml", "up", "-d"],
    "cwd": "/srv/zeus"
  },
  {
    "command": "docker",
    "args": ["compose", "-f", "/srv/upskii/docker-compose.yml", "up", "-d"],
    "cwd": "/srv/upskii"
  }
]
For Linux production hosts, install the command as a root-owned systemd service or run it as an operator account with permission to execute only the configured Docker restart command and the explicit post-restart commands needed by colocated projects. Use Restart=always so the host supervisor itself comes back after reboots or process crashes. Additional behavior:
  • If the watcher script itself changed in the pulled revision, the current watcher process restarts first and the replacement process performs the deploy.
  • If blue/green is already live and the standby color remains on an older revision for 15 minutes, the watcher rebuilds only the standby color in place. The active color remains primary for new traffic the whole time.
  • If the watcher sees a degraded blue/green runtime with a proxy or runtime marker present but no active web color serving traffic, it immediately retags the latest retained successful image into the active web color and starts it with --no-build. It prefers a retained image for the current main commit, then falls back to the newest retained successful image so the runtime can recover first and reconcile to main afterward. It then retags the same cached image into the opposite color and starts that as the warm standby, creating two ready copies without waiting for a fresh build.
  • Blue/green active and standby discovery uses Docker health, not just container presence. If the persisted active color is unhealthy but the opposite color is healthy, the watcher rewrites the active marker and proxy to the healthy color before building or refreshing another lane.
  • Cached recoveries write a fresh nginx proxy config before the proxy is started, so recovery never boots nginx with a stale upstream that points at a missing or unhealthy color.
  • That standby catch-up path also stops and removes the stale standby container before rebuilding it, so health checks target the fresh replacement container rather than an outdated standby instance.
  • Standby catch-up rebuilds reuse the current deployment stamp so the warm backup matches the latest deployment state instead of serving an older build if nginx needs to fail over.
  • The watcher dashboard surfaces the top 3 most relevant deployments from the recent history, prioritizing in-progress rollouts first, then the live promoted color, then the warm standby. Direct manual bun serve:web:docker:bg runs are written into that same history too.
  • Cached recoveries write both the active recovery and the standby refresh into the same retained ledger, preserving the current two warm copies plus the prior successful deployment as the fastest rollback reference. If no retained image exists, the watcher falls back to the normal recovery build path.
  • The infrastructure monitoring rollback controls show the latest retained cached recovery images separately from the general deployment history, so an operator can quickly select a known cache-backed commit before pinning it for rollback or smoke testing.
  • Successful active and standby builds tag the service image as {compose-project}-web-cache:{commit} and prune older retained cache tags beyond the three newest successful deployments. Pruning is idempotent: already-removed cache tags are ignored instead of warning in the live watcher log.

Deployment build lock

Blue/green deploys coordinate on a JSON lock file under tmp/docker-web/watch/blue-green-deployment-build.lock (owner PID, command, deployment kind, and a re-entrant token for nested helper calls).
  • On Linux, the helper compares /proc/<pid>/cmdline to the recorded lock so a reused PID after a crash cannot masquerade as an in-flight deploy. The recorded command is the package script name (bun serve:web:docker:bg), but production deploys usually run as node scripts/docker-web.js ...; the matcher treats those as the same holder so a live node deploy is not cleared as a stale PID reuse. On macOS and Windows, the same age-based stale window (DOCKER_WEB_DEPLOYMENT_LOCK_STALE_AFTER_MS, default eight hours) still clears abandoned locks because /proc validation is unavailable. When no web-proxy / web-blue / web-green containers exist, the auto-deploy watcher also runs the same stale-lock sweep before cached recovery.
  • DOCKER_WEB_DEPLOYMENT_LOCK_STALE_AFTER_MS: optional override for the default eight-hour window used when /proc is unreadable but kill(pid, 0) still reports a process (for example permission quirks). Set to 0 to disable age-based assists.
  • DOCKER_WEB_CANCEL_ACTIVE_BUILD=1 or --cancel-active-build on a manual bun serve:web:docker:bg run stops the watcher/buildkit services, clears the lock, and records a canceled history row before starting fresh.
  • The auto-deploy watcher treats an active deployment lock as a wait state, not a failed deploy attempt. Recovered pending handoffs, reconcile builds, standby refreshes, platform promotions, and imported Infrastructure project builds all defer behind the same lock so only one deployment build runs across the stack.
  • The watcher also treats a build lock older than 30 minutes as a timed-out build. For another live deployment PID, it sends SIGTERM to the recorded owner. If the lock is owned by the watcher process itself and the watcher has already returned to the polling loop, the lock is treated as leaked cached-recovery state and cleared without signaling the watcher. In both cases, the watcher records a failed deployment history row with the timeout reason and waits until the next polling cycle before retrying. Override the window with DOCKER_WEB_WATCHER_BUILD_TIMEOUT_MS; set it to 0 only when an operator explicitly wants to disable watcher-side build termination.
  • The apps/web Dockerfile deps stage retries bun install --frozen-lockfile up to three times with a Bun cache scrub between attempts. If the build still exits with bun install --frozen-lockfile exit code 1 after a git pull, regenerate bun.lock in a development checkout, commit the reviewed lockfile update, and deploy that commit. Do not let the production host rewrite bun.lock as part of the auto-deploy path. If tarball extraction still fails (for example @biomejs/cli-linux-x64), the blue/green helper prunes BuildKit exec cache mounts, restarts the Compose-owned buildkit service, and retries once with docker compose build --no-cache so a cached failed deps stage is not reused. The same one-time fresh retry is used for CACHED ERROR ... COPY --from=deps and for the build watchdog timeout.

Monitoring Surfaces

The infrastructure monitoring UI in apps/web is intentionally split into smaller pages instead of one oversized dashboard:
  • /{wsId}/infrastructure/monitoring for the operator overview, runtime snapshot, cron health summary, and jump points into deeper surfaces.
  • /{wsId}/infrastructure/monitoring/cron for cron job schedules, global enable/disable control, manual run requests, recent execution status, route responses, and captured web-container console logs.
  • /{wsId}/infrastructure/monitoring/rollouts for rollout controls, deployment charts, event streams, and ledger history.
  • /{wsId}/infrastructure/monitoring/requests for paginated proxy request history backed by the durable JSONL request store under tmp/docker-web/watch/blue-green-request-logs/.
  • /{wsId}/infrastructure/monitoring/watcher-logs for paginated watcher log browsing backed by tmp/docker-web/watch/blue-green-auto-deploy.logs.json.
Operationally, keep the overview route lightweight and treat request/log archives as dedicated drill-down pages. The summary snapshot is for quick operator context; durable history should be paged from the persisted ledgers.

Build Resource Caps

When build and serve run on the same machine, use the Docker web helper’s Buildx throttling options instead of letting BuildKit consume the full host. Example:
bun serve:web:docker:bg -- --build-memory 12g --build-cpus 4 --build-max-parallelism 1
Current root-script defaults:
  • bun serve:web:docker defaults to --build-memory 12g --build-cpus 4 --build-max-parallelism 1
  • bun serve:web:docker:bg defaults to --build-memory 12g --build-cpus 4 --build-max-parallelism 1
The Compose-owned BuildKit service also defaults mem_limit to 12g and cpus to 4 when DOCKER_WEB_BUILD_MEMORY and DOCKER_WEB_BUILD_CPUS are unset, which keeps large monorepo builds from overcommitting shared hosts that still run the web stack, Redis, log drain, and other sidecars alongside builds. Raise or lower the caps with env vars or helper flags when your machine is tighter or has spare capacity. You can still override those defaults per run by appending your own flags after --, for example:
bun serve:web:docker:bg -- --build-memory 16g --build-cpus 4 --build-max-parallelism 2
Equivalent environment variables:
  • DOCKER_WEB_BUILD_MEMORY=16g
  • DOCKER_WEB_BUILD_CPUS=4
  • DOCKER_WEB_BUILD_MAX_PARALLELISM=2
  • DOCKER_WEB_BUILD_BUILDER_NAME=tuturuuu
  • DOCKER_WEB_BUILDKIT_PORT=7914
  • DOCKER_WEB_BUILDKIT_ENDPOINT=tcp://127.0.0.1:7914
  • DOCKER_WEB_BUILDKIT_PRUNE_AFTER_BUILD=0 for blue/green watcher handoffs
  • DOCKER_WEB_BUILDKIT_STOP_AFTER_BUILD=0 to keep the buildkit container warm after a build
  • DOCKER_WEB_DOCKER_MEMORY_LIMIT=<bytes from docker info>
  • DOCKER_WEB_STATIC_PAGE_GENERATION_TIMEOUT=180
  • DOCKER_WEB_STATIC_GENERATION_MAX_CONCURRENCY=auto
  • DOCKER_WEB_NEXT_BUILD_CPUS=auto
  • DOCKER_WEB_NEXT_APP_ONLY=1
  • DOCKER_WEB_NODE_MAX_OLD_SPACE_SIZE=auto
  • DOCKER_WEB_NEXT_BUILD_ENGINE=turbopack
  • DOCKER_WEB_REACT_COMPILER=0
How it works:
  • The helper starts the Compose-owned buildkit service and then creates or reuses the remote Buildx builder named by DOCKER_WEB_BUILD_BUILDER_NAME. The container is named ${COMPOSE_PROJECT_NAME:-tuturuuu}-buildkit-1, so it stays visually grouped under the tuturuuu Docker Desktop stack.
  • The BuildKit caps accept auto. Auto memory uses Docker’s reported memory limit minus a small host overhead buffer, rounded down to MiB precision; auto CPU uses 1 CPU below 10 GB, 2 CPUs below 16 GB, and 4 CPUs on larger Docker allocations; auto max parallelism uses 1 below 16 GB and 2 above that. The E2E runner uses these auto caps by default so local Playwright verification adapts to the current Docker Desktop setting without requiring one-off env overrides.
  • DOCKER_WEB_BUILD_MEMORY caps the Compose-owned BuildKit service’s memory budget.
  • DOCKER_WEB_BUILD_CPUS sets the BuildKit service CPU budget.
  • DOCKER_WEB_BUILD_MAX_PARALLELISM writes a BuildKit config that limits concurrent solve steps, which is often the most effective way to reduce CPU spikes on smaller machines.
  • Host-side helper runs point Buildx at DOCKER_WEB_BUILDKIT_ENDPOINT (default tcp://127.0.0.1:${DOCKER_WEB_BUILDKIT_PORT:-7914}). The watcher container uses tcp://buildkit:1234 on the Compose network.
  • Blue/green watcher handoffs preserve the Compose-owned BuildKit cache volume by default (DOCKER_WEB_BUILDKIT_PRUNE_AFTER_BUILD=0), but stop and remove the buildkit container after the build/deploy phase (DOCKER_WEB_BUILDKIT_STOP_AFTER_BUILD=1). This frees idle CPU and memory while keeping layer state for the next deployment. Set DOCKER_WEB_BUILDKIT_STOP_AFTER_BUILD=0 only when an operator intentionally wants BuildKit to stay warm after a handoff.
  • Dockerized E2E is the exception: its per-run BuildKit state is disposable and scripts/run-web-e2e-docker.js sets DOCKER_WEB_BUILDKIT_PRUNE_AFTER_BUILD=1 unless explicitly overridden.
  • The same max-parallelism value is also forwarded as COMPOSE_PARALLEL_LIMIT when that variable is not already set. When the limit is 1, the blue/green workflow builds each Bake target group separately so image export and web compilation do not overlap on memory-constrained hosts.
  • When Docker reports less than 10 GB of total memory for a blue/green run, the helper also restarts the Compose-owned buildkit service immediately before the build batch. This clears long-lived BuildKit RSS before the replacement web image builds while the active lane is still running. Set DOCKER_WEB_BUILDKIT_RESTART_BEFORE_BUILD=0 to skip that low-memory restart, or 1 to force it on a larger host.
  • Docker web builds use bun run build:web:docker, which keeps the normal web build dependency graph, sets NODE_OPTIONS=--max-old-space-size to at least 4 GB on Docker allocations below 10 GB, then scales to 8 GB, 12 GB, or 16 GB based on the lower of Docker’s reported memory limit and the selected BuildKit memory cap, falling back to DOCKER_WEB_BUILD_MEMORY for environments where Docker memory cannot be detected. The helper reads Docker’s MemTotal and forwards it as DOCKER_WEB_DOCKER_MEMORY_LIMIT; auto buckets reserve 1 GB of effective Docker build memory for BuildKit, the active runtime lane, and sidecar overhead before selecting the Node heap bucket on larger allocations. Docker production builds use Turbopack under the real Node 24 runtime with App Router-only compilation and React Compiler disabled. This avoids Bun runtime crashes while loading native Next SWC modules and keeps local E2E aligned with production Docker builds. The Docker builder stage is based on node:24-bookworm-slim and copies the Bun binary in only for workspace script orchestration, so the actual next build process runs under real Node instead of Bun’s node shim. The @tuturuuu/web build:docker script delegates to scripts/run-web-docker-next-build.js, which spawns DOCKER_WEB_NODE_BINARY (pinned by the Dockerfile to /usr/local/bin/node) for the Next CLI and honors DOCKER_WEB_NODE_MAX_OLD_SPACE_SIZE=auto by default. Set a numeric heap value only when you need to override the bucket selection; values below 4096 MB are rejected. Set DOCKER_WEB_NEXT_APP_ONLY=0 or DOCKER_WEB_REACT_COMPILER=1 only when the host has enough memory headroom. Keep DOCKER_WEB_NEXT_BUILD_ENGINE on its default Turbopack value for production watcher hosts and local E2E runs.
  • Docker standalone Next builds default static page generation to a 180 second timeout, and auto-scale the inner Next build CPU count plus static generation concurrency from Docker memory. Docker allocations below 10 GB use 1 Next build CPU and static generation concurrency 1; 10-16 GB allocations use 2 for both; 16 GB and larger allocations use 4 for both. The Compose-owned BuildKit service still defaults to a 4 CPU budget, while the inner Next workers stay lower on smaller hosts to avoid OOM kills when the same machine is also running the active blue/green lane and sidecars. Override those with DOCKER_WEB_STATIC_PAGE_GENERATION_TIMEOUT, DOCKER_WEB_STATIC_GENERATION_MAX_CONCURRENCY, and DOCKER_WEB_NEXT_BUILD_CPUS when a host has more or less headroom.
  • Hive Docker images use a filtered workspace install and a Next standalone runner. Before apps/hive runs next build, the image must build @tuturuuu/types, @tuturuuu/internal-api, and @tuturuuu/supabase because those packages expose production dist/* subpath exports that Turbopack resolves during the standalone build. Hive realtime installs its filtered production workspace with Bun’s hoisted linker so the direct bun apps/hive-realtime/src/index.ts runtime can resolve top-level production packages such as postgres and @tuturuuu/realtime. Keep .dockerignore explicit about recursive generated directories such as **/.next/**, tmp/**, and apps/mobile/build/**; otherwise previous local builds can be copied into the next Docker context and inflate small sidecar images by several gigabytes.
Operational notes:
  • These caps affect image builds, not the runtime apps/web container after it has started.
  • If no build caps are configured, the helper continues using Docker’s default builder behavior.
  • Do not switch capped builds back to the Buildx docker-container driver. It creates Docker-managed containers named like buildx_buildkit_* outside the Compose project, which makes Docker Desktop grouping and health reporting confusing.
  • During capped-build setup, the helper removes known legacy Buildx builders such as platform-web-capped-builder before creating or reusing the Compose-owned remote tuturuuu builder. If docker buildx ls still shows that legacy builder, run the capped web deploy helper once so it can clean up the stale Buildx record.
  • A lower parallelism setting usually trades build speed for host stability.
  • If BuildKit fails with ResourceExhausted or cannot allocate memory while host memory is still available, the builder cgroup is too small; raise --build-memory while keeping --build-max-parallelism 1.
  • If host memory or swap is saturated, lower --build-max-parallelism first and stop unrelated containers before raising the builder memory cap.
  • If Bun fails during an image install with a tarball extraction error such as Fail extracting tarball for "@biomejs/cli-linux-x64", the blue/green helper treats it as BuildKit exec-cache corruption once per deployment attempt. It prunes BuildKit exec cache mounts, restarts the Compose-owned buildkit service, and retries the build once with --no-cache. A second failure is recorded as a real deployment failure with the original command context preserved in logs.
  • If BuildKit reports CACHED ERROR after a failed deps stage, or if the compose build exceeds DOCKER_WEB_BUILD_TIMEOUT_MS (default 45 minutes), the helper uses the same one-time cache recovery and fresh --no-cache retry.
  • If a deployment fails during the build, the watcher captures the actionable failure lines into the retained deployment history as failureReason. The Infrastructure → Monitoring → Deployments page and rollout ledger display that reason inline so operators do not need to reconstruct failures from a terminal scrollback.

Redis Profile

Redis is enabled by default in both dev and production-style Docker web stacks. The Redis and serverless-redis-http host ports bind to 127.0.0.1 only; do not expose them through Cloudflare Tunnel, public firewall rules, or all-interface Docker port mappings. The helper persists the generated token in:
  • tmp/docker-web/redis-token
and injects these values into apps/web automatically:
  • UPSTASH_REDIS_REST_URL=http://serverless-redis-http:80
  • UPSTASH_REDIS_REST_TOKEN=&lt;generated local token&gt;
The production Redis compose fragment requires UPSTASH_REDIS_REST_TOKEN during Compose interpolation. Use the Docker web helper, which injects the generated token automatically, or export a strong token before running direct docker compose --profile redis ... commands. Service env_file entries do not satisfy Compose interpolation for the Redis HTTP bridge token. Docker Redis mode intentionally ignores generic UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN values from the host shell. This prevents old Upstash REST URLs from leaking into self-hosted Docker containers after the Upstash instance is shut down. If a Docker host must override the bundled Redis sidecar, use the Docker-specific DOCKER_UPSTASH_REDIS_REST_URL and DOCKER_UPSTASH_REDIS_REST_TOKEN variables. If you intentionally want the memory-only fallback, opt out:
bun dev:web:docker -- --without-redis
That opt-out disables both the bundled Redis companion services and the Docker-injected UPSTASH_REDIS_REST_URL / UPSTASH_REDIS_REST_TOKEN variables. Redis-optional features such as rate limiting fall back to their non-Redis behavior, but security-sensitive one-time state such as CLI refresh-token replay protection fails closed until Redis is restored or the CLI user signs in again. Vercel-hosted satellite apps such as CMS, Calendar, Finance, Learn, Teach, and Tasks cannot reach Docker-private Redis hosts such as serverless-redis-http. Do not point their Vercel UPSTASH_REDIS_REST_URL at the Docker sidecar or expose Redis through Cloudflare Tunnel. Satellite proxy guards should run without Redis when Upstash is retired; protected product APIs continue to flow through apps/web, where Docker Redis is available.

Cloudflare Tunnel Profile

The Docker compose files include an optional cloudflared service. Enable it when the same host should publish the Dockerized web proxy through Cloudflare Tunnel:
bun serve:web:docker:bg -- --with-cloudflared
Required env:
  • CLOUDFLARED_TOKEN or DOCKER_CLOUDFLARED_TOKEN
For a remotely managed Cloudflare Tunnel, configure the public hostname route in Cloudflare to point at the Docker service:
  • Production blue/green: https://tuturuuu.com -> http://web-proxy:7803
  • Dev stack: https://dev.tuturuuu.com or a temporary hostname -> http://web:7803
The tunnel container joins the same compose network, so use Docker service names instead of host ports in the Cloudflare service URL. Keep cms.tuturuuu.com and other satellite app hostnames on Vercel unless those apps are explicitly moved into this Docker stack. Production compose binds host-published web, Hive, Meet, and Redis ports to 127.0.0.1 only. Do not remove that loopback prefix during blue/green migration; public exposure should go through Cloudflare Tunnel or another controlled frontend, not the staged Docker host ports. When blue/green is deployed with --with-cloudflared, the watcher receives DOCKER_WEB_WITH_CLOUDFLARED=1 so future auto-deploys keep the tunnel profile active and do not remove the cloudflared container as an orphan.

Auto-Pull Blue/Green Watcher

For simple self-hosted boxes that deploy directly from a Git branch, the repo also provides a long-running auto-deploy watcher:
bun serve:web:docker:bg:watch
That command now bootstraps Docker instead of running the watcher loop as a host PID. Each invocation:
  1. Writes the forwarded watcher CLI args to tmp/docker-web/watch/blue-green-auto-deploy.args.json.
  2. Rebuilds and force-recreates the dedicated web-blue-green-watcher service.
  3. Tails that container’s live logs so the terminal still shows the watcher dashboard.
The watcher container mounts:
  • the repo worktree at /workspace
  • the same repo again at the real host checkout path via PLATFORM_HOST_WORKSPACE_DIR, so host Docker bind mounts resolve against the host filesystem when the watcher shells into docker compose ...
  • /var/run/docker.sock so it can manage the blue/green compose stack itself
  • the shared Bun install cache volume
  • a dedicated watcher node_modules volume so the frozen dependency install stays container-local
Behavior:
  1. Reads the built-in platform project from the log-drain Postgres project registry. The production watcher service is wired with PLATFORM_LOG_DRAIN_DATABASE_URL so a live watcher can consume queued Infrastructure project deployments instead of falling back to the legacy single-branch loop.
  2. The seeded branch is production, but operators can change it from Infrastructure → Monitoring → Projects. If the selected project branch differs from the current checkout, the watcher restarts its child process, fetches, and checks out that branch only when the worktree is clean. Dirty worktrees are reported as blocked instead of being force-switched. If the watcher is already stuck on the wrong branch, the monitoring UI queues a watcher recovery request and web-cron-runner recreates web-blue-green-watcher through Docker Compose out of band.
  3. Locks the selected local branch and tracked upstream at startup.
  4. Writes a PID-backed lock file at tmp/docker-web/watch/blue-green-auto-deploy.lock.
  5. Renders a live terminal dashboard with the locked branch, tracked upstream, latest local commit, relative commit age, last check time, next poll time, current blue/green runtime state, and recent watcher events.
  6. Polls the tracked upstream every 1000ms by default.
  7. Auto-clears and redraws the dashboard in place on each state change when attached to a TTY.
  8. Runs the Git and deploy subprocesses quietly so the dashboard is not disrupted by git fetch, git pull, or Docker build output during normal watcher operation.
  9. Skips pulls if the worktree is dirty.
  10. Uses git pull --ff-only only when the local branch is strictly behind the locked upstream.
  11. Runs bun install --frozen-lockfile automatically after every successful fast-forward pull so installed dependencies match the reviewed bun.lock before the deploy handoff continues. The watcher does not run bun upgrade or a non-frozen install on the production host.
  12. Treats any dirty bun.lock as a blocking worktree change. Lockfile updates must be reviewed and committed before the watcher can continue polling or deploying.
  13. Runs bun serve:web:docker:bg automatically after a successful fast-forward pull.
  14. Polls imported Infrastructure projects from log-drain Postgres, synchronizes enabled public GitHub projects into tmp/docker-web/projects/<projectId>/repo, deploys them through generated Next.js compose files under the shared tuturuuu Compose project, and merges hostname routes into the central nginx proxy. The imported-project and manual deployment queue cadence is independent from the normal Git polling interval, so a watcher configured with a long Git interval such as 1000 seconds still wakes on the shorter project queue interval to advance queued Deploy actions. Platform project state is updated on both queue-only deploys and normal fast-forward deploys, so a successful pull/deploy clears queued and refreshes the latest commit columns instead of relying only on deployment history. Imported project builds share the same deployment build lock as platform blue/green builds. If platform, standby, recovery, or another imported project build is already active, the project poll is deferred instead of starting a second Docker build.
  15. If watcher runtime code such as scripts/watch-blue-green-deploy.js, scripts/docker-web/blue-green.js, or scripts/docker-web/env.js changed in the pulled revision, the current watcher does not deploy from the old process. It releases its lock, spawns a replacement watcher with the same CLI args, and exits first.
  16. The replacement watcher refreshes the live web-proxy nginx config and workers in place if blue/green is already serving traffic, verifies proxy routing through /__platform/drain-status, and only then starts the new blue/green build/promotion.
  17. If compose or helper-image wiring changed, including docker-compose.web.prod.yml, Hive service files, MarkItDown service files, apps/storage-unzip-proxy package/source files, or apps/web/docker/cron-runner*, the containerized watcher recreates its own compose service before the pending deploy handoff. The deploy then includes only the affected buildable helper images in the blue/green build command instead of rebuilding every service on every commit.
  18. Retries recoverable Git command failures instead of exiting. The first retry waits 1 minute, then the watcher backs off exponentially on consecutive Git failures up to a 15 minute ceiling.
  19. Caps deployment attempts at 3 failures per commit. A recovered pending handoff failure is recorded, the pending request is cleared, and the watcher keeps polling; once the cap is reached, that commit reports retry-limited until a new commit is available or an operator pins a different deployment.
  20. Stops immediately if the checked-out branch changes while the watcher is running.
  21. If another watcher already owns the lock, a new invocation can fail with guidance, mirror the active watcher with --resume-if-running, or replace it with --replace-existing.
Operational notes for the containerized watcher:
  • Manual bun serve:web:docker:bg and watcher-triggered deploys share a deployment-build lock at tmp/docker-web/watch/blue-green-deployment-build.lock. This lock is separate from blue-green-auto-deploy.lock: the watcher may remain alive, but only one build/deploy phase can be active across manual deploys, auto-pulls, standby refreshes, rollback pins, cached recovery, and reconcile deploys.
  • If a manual deploy sees that lock or a live watcher status of building or deploying, an interactive terminal prompts before it interrupts the active deployment. Confirming stops web-blue-green-watcher, stops/resets the Compose-owned BuildKit work, clears the active build lock/status, records the interrupted entry as canceled, then starts the requested deployment alone.
  • Non-interactive manual automation fails fast on an active deployment unless --cancel-active-build or DOCKER_WEB_CANCEL_ACTIVE_BUILD=1 is provided. Use that override only when it is acceptable to interrupt all BuildKit work owned by the platform deployment stack.
  • Re-running bun serve:web:docker:bg:watch intentionally recreates the watcher container so it picks up local repo changes, new CLI args, and watcher-image updates in one path.
  • The host log follower treats Docker’s 143 exit from an intentionally recreated watcher container as a reconnect signal, then reattaches to the replacement service instead of leaving the terminal dark.
  • If the followed watcher logs explicitly request host-supervised watcher service recreation, the host wrapper force-recreates web-blue-green-watcher before reattaching. Do not rely only on Docker’s restart policy in this path: the old container can briefly report healthy while still running the stale image/runtime.
  • Git fetch/pull credentials now need to be usable inside the watcher container because the watcher no longer runs directly on the host.
  • Full Docker daemon or Docker Desktop crashes cannot be recovered by a watcher that is itself running inside Docker. Keep bun serve:web:docker:bg:watch running from the host, ideally under systemd, launchd, or another host process supervisor. That host command waits for the Docker daemon to respond again, reruns the watcher compose up --build --detach --force-recreate, and then resumes tailing logs. Every hosted project with its own Docker watcher needs its own host-side watch process; container restart: policies only help after Docker is already healthy again.
  • The host Docker recovery loop polls docker info every 5 seconds by default. Override with DOCKER_WEB_WATCHER_DOCKER_RECOVERY_POLL_MS. By default it waits indefinitely because a host process manager is expected to own the terminal process; set DOCKER_WEB_WATCHER_DOCKER_RECOVERY_TIMEOUT_MS to a positive value to fail after a bounded recovery window.
  • The watcher image lives at apps/web/docker/blue-green-watcher.Dockerfile.
  • Its entrypoint wrapper relaunches the watcher in-place when scripts/watch-blue-green-deploy.js requests a self-restart after pulling a new watcher revision.
  • The entrypoint is also the watcher supervisor. It restarts the child process after crashes, after the status snapshot fails to appear during startup, or after blue-green-auto-deploy.status.json becomes stale. The compose service uses restart: unless-stopped so Docker also brings the watcher back after a daemon or container failure.
  • A stale status snapshot is tolerated while the snapshot already shows an active building or deploying deployment. During a long docker compose build, the watcher child is intentionally busy inside the deploy command and may not rewrite the status file until the command exits. The wrapper keeps the child alive until DOCKER_WEB_WATCHER_BUILD_TIMEOUT_MS plus a short grace window, then treats the stale snapshot as unhealthy.
  • bun serve:web:docker:bg:down also stops the watcher service because it is part of the production compose stack now.
Dashboard details:
  • Shows the current active blue/green color when web-proxy is serving live traffic.
  • Docker resource rows use the running containers directly as a fallback when docker compose ps cannot inspect the prod stack because of env interpolation issues, so watcher metrics can still appear on an already-live deployment.
  • Docker stats are read with an explicit field format instead of Docker’s version-dependent JSON object shape, which avoids bogus 0 CPU/memory readings when the watcher is running against a different Docker release.
  • The watcher parser also normalizes locale-style decimal commas from docker stats, so hosts that emit values like 0,10% or 24,0MiB no longer collapse into zeroed metrics.
  • Each watcher snapshot now includes docker ps metadata for every running container visible through the host Docker socket, plus compose service health for containers in the production project. The monitoring overview uses that persisted snapshot to show service health and a full running-container inventory without mounting the Docker socket into apps/web.
  • The request archive view computes route summaries, status totals, RSC counts, and error totals across the selected timeframe instead of only the visible page. The default timeframe is seven days, the API rejects unbounded or oversized windows, and operators can query at most 30 days of retained request logs at a time. The web API keeps a short in-process aggregate cache keyed by bounded timeframe plus telemetry log file stats, but the cache stores only aggregate analytics so request rows are not retained in memory between page reads.
  • Drive ZIP extraction does not stream extracted file bytes back through the web app proxy. The unzip worker requests a per-entry signed upload URL from the callback route and uploads extracted files directly to trusted storage origins only, which avoids nginx body-size limits for large WebGL artifacts while keeping folder creation and auth checks in the backend callback.
  • Hive is promoted with the web blue/green color: hive-blue and hive-green are routed from hive.tuturuuu.com, and hive-realtime serves /realtime with HIVE_REALTIME_TOKEN_SECRET, HIVE_REALTIME_URL, and NEXT_PUBLIC_HIVE_REALTIME_URL configured in the same production stack. Hive product data is stored in the Docker-managed hive-postgres service via HIVE_DATABASE_URL; Supabase remains the identity/session source only. The web, web-cron-runner, hive-{color}, and hive-realtime services must all receive that URL so API routes, disabled-by-default simulation cron, the editor, and the CRDT realtime service share the same Hive product database. Optional local LLM support runs behind the hive-ollama profile and is disabled unless operators enable the profile and Hive settings enable the exact gemma4 model. Production compose publishes 127.0.0.1:7814:7814 from web-proxy, not from a direct Hive container, so host-local or Cloudflare tunnel traffic to localhost:7814 always reaches the currently promoted Hive color without exposing staged migration ports on every host interface. Deploys verify that the running web-proxy container has the required loopback host bindings (7803, 7814, and 7816) and that its running image matches the resolved Compose image before reusing it; if an older proxy was created before Hive moved behind blue/green or before the nginx image pin changed, the next deploy force-recreates the proxy so the host-level Cloudflare Tunnel route can reach Hive on the expected proxy runtime. The Hive color services use the same Supabase env source as apps/web: runtime env files are shared, and production image builds mount the web_env BuildKit secret so hidden-locale auth pages can prerender with the platform Supabase URL. Deploy coordination: scripts/docker-web/blue-green.js still scopes prod builds by changed service group, but runtime promotion happens in staged order: first the target web-{color}, then hive-{color} and hive-realtime, then refreshed support services such as backend, MarkItDown, storage-unzip-proxy, web-cron-runner, and optional Redis-backed helpers. If web-proxy or cloudflared must be bootstrapped or recreated for host-port changes, they start only after target web, Hive, and support services are healthy, so hive.tuturuuu.com is not exposed with an empty hive_app_upstream and web is not publicly switched before dependent gates finish. Promotion waits for the final proxy route check before writing active-color.
  • Every service owned by docker-compose.web.prod.yml should declare a healthcheck, either directly in compose or in the image. The resources inventory treats an Up container without Docker health metadata as healthy for cross-project runtime visibility, but first-party prod services and sidecars still need explicit probes so deploy gates can fail before promotion.
  • The MarkItDown sidecar needs SUPABASE_URL set to the same Docker-internal Supabase URL used by the web container. The service validates signed Storage URLs before downloading attachments, and local Docker runs may use host.docker.internal over HTTP.
  • MarkItDown source changes and storage-unzip-proxy package/source changes are part of the watcher refresh globs. Keep those globs in sync with any future sidecar entrypoints so a running watcher refreshes helper containers during the next deploy handoff, not just the web app container.
  • Host dependency refreshes must not rewrite bun.lock on the production host. The watcher treats a dirty lockfile as a blocking worktree change, and its automatic dependency sync uses bun install --frozen-lockfile only.
  • Runtime upgrades are an explicit operator action. The watcher does not run bun upgrade; update the host Bun runtime only after reviewing the pinned version in the repository and the watcher image.
  • Recoverable Git poll failures stay visible in the dashboard as a retrying watcher state instead of terminating the process, and the next-check timer reflects the active backoff delay.
  • If git fetch or git pull --ff-only fails only because a Git lock already exists, the watcher inspects the lock age and removes it automatically only when it is stale (older than 2 minutes). This covers .git/index.lock, .git/packed-refs.lock, and remote-ref locks such as .git/refs/remotes/origin/staging.lock. Fresh lock files are left in place so active Git operations are not interrupted.
  • Build/deploy failures also stay inside the watcher loop. The watcher records failed attempts in deployment history, clears stale pending handoff files after recovery failures, and stops retrying the same commit after the third failed deployment attempt. Once that cap is reached, it reports the retry-limited state once for that commit instead of logging the same skip on every poll.
  • Normal promotions keep the long-lived web-proxy container and bound port stable, which avoids transient listener drops for upstreams such as Cloudflare Tunnel that are connected to :7803. Proxy container recreates are reserved for required host-port or image drift and happen only after the replacement web/Hive lane is healthy.
  • Persists recent deployment history, including manual bun serve:web:docker:bg runs, and renders the top 3 most operationally relevant entries as stacked terminal cards that favor vertical scanability over very wide lines.
  • Each deployment card now uses a stronger header with status/color badges plus grouped metric bands, so active traffic state, rollout intent, and request-rate data are easier to scan while multiple cards are stacked.
  • As soon as a new commit starts rolling out, the recent deployment section shows it immediately as DEPLOYING instead of waiting for the rollout to finish.
  • Each deployment block includes:
    • deploy status (ACTIVE, ENDED, or FAILED)
    • build time
    • activation/finish time
    • deployment lifetime while it served traffic
    • total requests served during that deployment window
    • average requests per minute
    • peak requests per minute
    • day: requests served on the current day for the active deployment, or the final active day for an ended deployment
    • davg: average requests per day across that deployment’s serving lifetime
    • dpeak: busiest single-day request count across that deployment’s serving lifetime
  • The live blue/green summary uses the same traffic metrics as the deployment history cards, with consistent color coding for build/lifetime/traffic/age metrics so the dashboard is easier to scan quickly.
  • A dedicated Docker resources row summarizes aggregate CPU, memory, and network usage across the live blue/green containers, followed by a per-container row for proxy, green, and blue when those services are running. This is sampled from docker stats --no-stream, so it stays local to the host and is appropriate for self-hosted operator monitoring.
  • The infrastructure dashboard’s Docker Runtime Inventory uses the watcher snapshot as the source of truth for every running Compose container and derives total CPU and memory from those rows when present, so the summary cards stay aligned with the detailed container inventory.
  • The bundled serverless-redis-http companion uses an in-container wget health check that posts ["PING"] to / with the generated SRH_TOKEN; do not use a Node-based probe for that image because it is an Erlang release image, and do not probe /ping because SRH does not expose that route.
  • Production Redis compose requires UPSTASH_REDIS_REST_TOKEN and binds Redis host ports to 127.0.0.1. Do not reintroduce the platform-local-redis-token fallback in production fragments or remove the loopback host bind; direct Compose users must export a strong token before enabling the redis profile.
  • After a successful host-triggered serve:web:docker:bg rollout, the Docker helper starts or resumes the containerized web-blue-green-watcher with --resume-if-running. Deploys that are already running inside the watcher skip that handoff via PLATFORM_BLUE_GREEN_WATCHER_CONTAINER=1, which avoids recursive watcher starts while still leaving a poller alive for future Git commits.
  • The watcher wrapper and child process must agree on runtime files. If PLATFORM_BLUE_GREEN_WATCH_ARGS_FILE, PLATFORM_BLUE_GREEN_WATCH_RUNTIME_DIR, or PLATFORM_BLUE_GREEN_WATCH_STATUS_FILE are set, both the wrapper and child use those paths so the wrapper does not restart a healthy child for a missing status snapshot.
  • Request counters now come from a persisted local proxy-log drain under tmp/docker-web/watch/blue-green-request-telemetry.*, not from one-off docker logs scrapes in the dashboard. The watcher continuously drains structured web-proxy access logs into a local ledger, so request metrics survive watcher restarts and do not require any external analytics service.
  • Internal proxy health checks for /api/health and /__platform/drain-status are excluded from the request totals so the numbers reflect real served traffic more closely.
  • The proxy now emits structured JSON access logs that include the upstream deployment stamp and blue/green color. That lets the watcher link requests back to the correct deployment instead of only estimating by time window.
  • For each newly drained proxy request, the watcher also reads recent web-blue and web-green container stdout/stderr and stores up to 20 route console lines that fall inside the request latency window. Those captured lines are persisted on the request-log record itself so the request explorer can show request-scoped server console output after the live Docker logs have moved on.
  • The watcher retains up to 10,000 deployment history entries and up to 100,000,000 recent drained request-log records on disk, bounded by a 256 MiB durable request-log byte cap by default. When the next request record would exceed the byte cap, the watcher rotates the current JSONL chunk if needed and prunes older chunks before appending, so public request URIs cannot grow the host-backed ledger without an aggregate limit. Rolling daily/weekly/monthly/yearly metric buckets plus a recent-request excerpt still feed the monitoring dashboard.
  • The watcher also persists a separate latest-log ledger under tmp/docker-web/watch/blue-green-auto-deploy.logs.json, which captures the high-level poll/pull/build/deploy watcher messages with deployment stamps and commit hashes when available. The infrastructure dashboard uses that ledger for a deployment-scoped latest-log view without needing live docker logs access.
  • The watcher uses the same Docker runtime env resolution as the real deploy flow, so blue/green status probes still work when the Redis profile is part of the production compose file.
  • The active watcher also persists a live status snapshot under tmp/docker-web/watch/, which is what --resume-if-running uses to mirror the dashboard without taking over the PID lock.
  • The infrastructure dashboard at /{ROOT_WORKSPACE_ID}/infrastructure/monitoring reads the same watcher status snapshot and renders it as a Next.js control room with rollout, request-rate, container-resource, and event-feed views.
  • The monitoring dashboard now exposes paginated request and watcher-log explorers. Route filters come from normalized request paths, the raw request URI still surfaces query signatures, and ?_rsc=* requests are called out so React Server Component traffic is inspectable separately from document hits.
  • Deployment-facing dashboard surfaces deduplicate successful blue/green rows for the same commit so active and standby colors do not appear as separate rollouts. Failed attempts remain separate because the retry cap and recovery debugging depend on seeing each failed build/deploy attempt. Large deployment, rollback-candidate, Docker-service, and container lists are paginated in the UI instead of rendering every retained row at once.
  • Production web, web-blue, and web-green containers mount ./tmp/docker-web read-only at /app/runtime/docker-web and use PLATFORM_BLUE_GREEN_MONITORING_DIR to find the watcher snapshot. Keep that mount/env pair in sync if the runtime path changes, or the dashboard will degrade to an empty offline state even while blue/green deployments still work.
  • Production web, web-blue, and web-green also mount the narrower ./tmp/docker-web/watch/control path read-write at /app/runtime/docker-web-control via PLATFORM_BLUE_GREEN_CONTROL_DIR. Keep operator command files in that control directory so the broader watcher runtime and telemetry mount can stay read-only.
  • The monitoring dashboard’s “Sync Standby Now” action writes tmp/docker-web/watch/control/blue-green-instant-rollout.request.json. The watcher consumes that file on its next poll, clears it after a success, failure, or no-op, and uses it to rebuild the standby color immediately so blue and green can converge on the same commit without waiting for the stale standby window.
  • The dashboard reads that pending instant-rollout request back from the watcher control directory. While the request is queued, or while the latest standby refresh is building/deploying, the sync button stays disabled and shows a queued/building status instead of allowing duplicate control files.
  • The monitoring dashboard’s rollback pin action writes tmp/docker-web/watch/control/blue-green-deployment-pin.json. The watcher treats that file as authoritative: it skips normal fetch/pull work, checks out the pinned commit in detached mode, deploys it if the latest successful deployment is different, and keeps production on that commit until the pin is removed from the dashboard. Removing the pin lets the watcher check out its locked branch again and resume normal fast-forward polling.
  • In-container watcher child restarts preserve the locked branch/upstream metadata even when the child is killed while Git is detached for a rollback or parent-fallback build. The replacement child can recover production from the target-only lock instead of trying to poll detached HEAD. If an older child already removed the lock, a clean detached startup falls back to the selected platform branch (production by default) before locking and polling.
  • When the watcher receives a shutdown signal while it is temporarily detached, it attempts to check out the locked branch again before exiting, as long as the worktree is clean. This keeps manual operator commands such as git pull && bun serve:web:docker:bg from inheriting a detached checkout after a stopped watcher.
  • The watcher image must also include both the Docker Compose and Buildx CLI plugins (docker-cli-compose and docker-cli-buildx on Alpine) because the rollout handoff shells into docker compose ..., and capped production builds create/use the remote tuturuuu Buildx builder from inside the watcher container. The watcher reaches the Compose-owned BuildKit daemon at tcp://buildkit:1234.
  • When the watcher drives Docker Desktop through /var/run/docker.sock, it must run the deploy handoff from the mirrored host-path mount, not a container-only path like /workspace. Otherwise Docker Desktop rejects bind mounts such as ./tmp/docker-web/prod/nginx.conf with “mounts denied” because /workspace/... is not a real shared host path.
  • The watcher compose environment preserves PLATFORM_HOST_WORKSPACE_DIR and pins COMPOSE_PROJECT_NAME from that host checkout path unless DOCKER_WEB_COMPOSE_PROJECT_NAME is explicitly set. A canonical checkout directory named platform maps to the tuturuuu Compose project so Docker Desktop groups the stack under the product name on clean startups. During self-refresh from an already-running legacy platform Compose project, the legacy watcher starts a staged tuturuuu watcher with non-conflicting host ports, then stops only the old watcher service. The target watcher builds or recovers the tuturuuu stack before it touches the public proxy port. When the target proxy is healthy on the staged port, it stops the legacy platform proxy, recreates the tuturuuu proxy on port 7803, and verifies the internal drain-status route within 3 seconds. If that handoff exceeds 3 seconds or the target proxy health check fails, the watcher stops the target proxy, restores the legacy platform proxy, and leaves the legacy project intact for another retry. After a successful handoff, it removes the old platform Compose project with docker compose down --remove-orphans. Once the legacy project is absent, a watcher that lacks inherited Compose project env is treated as the fully migrated tuturuuu watcher and remains in the normal Git poll/build loop instead of starting another migration handoff. Do not inherit arbitrary container-scoped Compose project names from the watcher container; doing so can create duplicate service names such as nested tuturuuu-markitdown-1 containers during self-refresh.
  • If Docker reports container name ... is already in use for a requested production service, the helper removes only the exact expected container name for the current Compose project, then retries docker compose up. This handles stale names left by interrupted rollouts without pruning unrelated containers.
  • Starting bun serve:web:docker:bg:watch now clears the persisted watcher status snapshot and active PID before the watcher service is force-recreated, but preserves any complete branch/upstream target metadata. A stale lock from the previous container cannot block the replacement watcher, and a detached checkout still has enough metadata to reattach to production.
  • If the watcher pulls a revision that changes its own Dockerfile or baked entrypoint, it now rebuilds and recreates the web-blue-green-watcher service automatically before handing off to the next deployment cycle.
  • When that container-refresh request is emitted from inside the followed watcher logs, bun serve:web:docker:bg:watch treats the log text itself as the recreate signal, then rebuilds/recreates and resumes tailing. This keeps the service from getting stuck in a Docker-restarted but operationally offline state.
  • Recovery handoffs now persist a pending-deploy request under tmp/docker-web/watch/ and reconcile it against the latest successful deployment history entry on startup. If HEAD is newer than the last successful built/deployed commit after a watcher restart or container recreate, the watcher builds the current HEAD before settling back into normal polling.
  • The same reconciliation now also runs during steady-state polling: if Git is already up to date but the latest successful deployment record still points at an older commit, the watcher rebuilds/deploys the current HEAD instead of incorrectly reporting up-to-date.
  • Blue/green deploys build the replacement lane before stopping or removing any existing blue/green container. A failed docker compose build must leave the currently serving lane and any warm standby untouched; only after the build succeeds may the target lane be recreated with --no-build.
  • If tmp/docker-web/prod/active-color is missing or stale, the deployer and watcher recover the serving lane from the generated nginx proxy config before deciding what to rebuild. Treat the proxy config as the runtime source of truth during drift so a failed reconciliation cannot misclassify and clear a stable deployment.

Browser-State 502 Recovery

If some normal browsers still return Cloudflare 502 Host Error, 431, or Chrome ERR_INVALID_RESPONSE while incognito works, treat it as stale client state or an auth-cookie/header-size problem before assuming the tunnel itself is broken. Normal browser-state recovery should use the app route: GET is a no-store confirmation page only, and the destructive Clear-Site-Data response happens only after a same-origin POST. Oversized request headers are different: the request may never reach Next.js. The production web-proxy therefore gives web, Hive, and Meet 64 KB request header headroom, maps Nginx 431/494 oversized-header failures to a local browser-state recovery response, and the Docker web app server starts with --max-http-header-size=65536 so ordinary Supabase auth-cookie chunking has matching headroom after proxying. How to recognize each failure mode:
  • upstream sent too big header while reading response header from upstream means nginx response-header buffers were too small for the auth response.
  • web-green could not be resolved or web-blue could not be resolved means a stale nginx worker or keepalive connection still tried to reach a color that no longer existed. The current warm-standby model is designed to avoid that.
  • A browser that fails only in regular mode but works in incognito usually has stale Supabase auth cookies, stale service-worker state, or both.
Recovery path:
  • Send affected users to the recovery route on the affected origin: https://tuturuuu.com/~recover-browser-state for the main app, or https://hive.tuturuuu.com/~recover-browser-state for Hive.
  • That route is public and bypasses auth/onboarding middleware. When the request can reach the app, use the confirmation form so the cookie-clearing POST explicitly expires Supabase auth cookie variants for the current host.
  • If the browser is already sending too many stale cookies, the proxy catches the 431/494 before Next.js and returns Clear-Site-Data: "cache", "cookies", "storage", "executionContexts" while redirecting the browser back to /login?browserStateReset=1.
Operational signals:
  • Inspect X-Platform-Deployment-Stamp, X-Platform-Blue-Green-Primary, and X-Platform-Blue-Green-Color response headers to confirm which rollout is currently serving a request.
  • If the recovery URL fixes the issue for a user, the likely root cause was stale browser state rather than an active deploy outage.
  • If recovery does not help and proxy logs still show too big header, focus on auth redirect size or additional cookie bloat.
Operational notes:
  • This is intended for clean deployment clones on a server, not for active developer worktrees.
  • If the local branch is ahead of or diverged from the tracked upstream, the watcher logs and skips the pull instead of forcing a merge or reset.
  • The self-restart path only triggers when the watcher script itself changed in the fetched revision; normal app-code deploys keep the current watcher process alive.
  • During that self-restart path, nginx keeps the assigned proxy port up the whole time because the replacement watcher refreshes the existing proxy container in place before it starts the new build.
  • The watcher inherits the default blue/green build caps from bun serve:web:docker:bg, so the current defaults still apply during auto-deploys.
  • Deployment history is watcher-managed. Manual blue/green rollouts still show up in the live runtime status if the stack is active, but they do not backfill the watcher’s last-3 deployment list unless they were performed through the watcher itself.
  • Rollback pins are intended for bad latest deployments or failed reconciliation builds. Pin only a known successful deployment from the retained ledger, then remove the pin after main contains the corrective commit you want the watcher to resume deploying.

Validation And CI

docker-setup-check.yaml now validates all of the following:
  • node scripts/check-docker-web.js
  • node --test scripts/check-docker-web.test.js scripts/docker-web.test.js
  • docker compose -f docker-compose.web.yml config
  • docker compose -f docker-compose.web.yml --profile redis config
  • docker compose -f docker-compose.web.yml --profile cloudflared config
  • docker compose -f docker-compose.web.prod.yml config
  • docker compose -f docker-compose.web.prod.yml --profile redis config
  • docker compose -f docker-compose.web.prod.yml --profile cloudflared config
  • docker build -f apps/backend/Dockerfile .
  • docker build --target dev -f apps/web/Dockerfile .
  • docker build --target runner --secret id=web_env,src=.env.local -f apps/web/Dockerfile .
That means Docker CI now covers both the dev image and the real production path. For focused watcher-script checks, run the root Node test file directly, for example node --test --test-name-pattern "pullTrackedBranch" scripts/watch-blue-green-deploy.test.js. Running bun test scripts/... from the repo root invokes the package test script and can expand into the full Turbo test suite. When scripts/check-docker-web.js or similar root validators need to assert a literal Dockerfile template placeholder like ${process.env.PORT || 7803}, prefer a regex or another explicitly escaped matcher instead of a plain string literal. Biome treats raw ${...} text inside normal strings as lint/suspicious/noTemplateCurlyInString, which can break CI even when the runtime behavior is unchanged.

Operator Notes

  • Do not paste docker compose config output into chat or tickets; it expands env values.
  • If you need rebuild-before-restart on a server, use bun serve:web:docker:bg.
  • If the latest blue/green deployment is bad, use the infrastructure monitoring dashboard to pin a previous successful deployment before debugging forward.
  • If a blue/green deploy is interrupted, rerunning the same command from the intended commit is the normal recovery path.