Trigger.dev Architecture Overview
Trigger.dev's system architecture — run engine, queue system, worker orchestration, database schema, and real-time coordination
Trigger.dev Architecture Overview
For Product Managers
This page explains how Trigger.dev is structured as a software system. Focus on the diagrams and "Why It Matters" callouts to understand how tasks flow from your code to execution and back. You don't need to understand every technical detail.
Trigger.dev uses a multi-service architecture where a central web application accepts task triggers, a run engine manages queueing and retries, and supervisors launch isolated task containers. This page explains how those services coordinate so background jobs remain durable, fair, and observable.
How Trigger.dev Architecture Works
Trigger.dev follows a multi-service architecture where the webapp is the central hub, coordinating between the SDK, workers, queues, and storage:
┌─────────────────────────────────────────────────────────────────────┐
│ YOUR APPLICATION (Node.js / Next.js) │
│ @trigger.dev/sdk → trigger(myTask, payload) │
└──────────────────────────────┬──────────────────────────────────────┘
│ HTTPS (REST API)
┌──────────▼──────────┐
│ Webapp (Remix) │ ← Dashboard + API
│ Express :3030 │ REST v1/v2/v3
│ Socket.io │ Real-time coordination
└──┬──────┬───────┬───┘
│ │ │
┌───────────────▼─┐ ┌──▼──────────┐ ┌▼───────────────┐
│ PostgreSQL │ │ Redis │ │ ClickHouse │
│ (Primary DB) │ │ (Queue + │ │ (Analytics + │
│ Prisma ORM │ │ Locks) │ │ Logs) │
└─────────────────┘ └──┬──────────┘ └────────────────┘
│
┌──────────▼──────────┐
│ Run Engine │ ← Orchestrates execution
│ MarQS Fair Queue │ Enqueue → Dequeue → Track
│ Redis Worker │
└──────────┬──────────┘
│ Dequeue
┌──────────▼──────────┐
│ Supervisor │ ← Workload manager
│ (Docker or K8s) │ Starts/stops containers
└──────────┬──────────┘
│ Spawn container
┌──────────▼──────────┐
│ Task Container │ ← Your task code runs here
│ (isolated runtime)│ Heartbeats, logs, completion
└─────────────────────┘Why It Matters
Every task trigger flows through the Webapp API → Run Engine (Redis queue) → Supervisor → Container. This separation means the API responds instantly (the task is enqueued), while heavy computation runs in isolated containers. If a container crashes, the Run Engine knows and can retry automatically. The queue is multi-tenant and fair, so one user's spike in tasks won't starve other users.
The Five Applications
| App | Technology | Purpose |
|---|---|---|
| webapp | Remix 2 + Express + React 18 | Dashboard UI, REST API (v1/v2/v3), WebSocket coordination, run engine host |
| supervisor | Node.js + Docker/K8s clients | Manages container lifecycle — starts, monitors, and cleans up task containers |
| coordinator | Node.js + Socket.io | Coordinates worker instances, relays execution payloads and completions |
| docker-provider | Node.js + Dockerode | Docker-specific container management for self-hosted deployments |
| kubernetes-provider | Node.js + @kubernetes/client-node | Kubernetes-specific pod management for cloud and enterprise deployments |
The Run Engine
The Run Engine is the heart of Trigger.dev — it orchestrates every task from trigger to completion.
┌──────────────────────────────────────┐
│ RUN ENGINE │
│ │
│ ┌──────────────┐ ┌───────────────┐ │
│ │ EnqueueSystem│ │ DequeueSystem │ │
│ │ (accept run) │ │ (pick next) │ │
│ └──────┬───────┘ └───────┬───────┘ │
│ │ │ │
│ ┌──────▼──────────────────▼───────┐ │
│ │ RunQueue (Redis) │ │
│ │ Fair multi-tenant (MarQS) │ │
│ │ Concurrency + Rate limits │ │
│ └──────────────┬───────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────┐ │
│ │ RunAttemptSystem │ │
│ │ (track attempts + retries) │ │
│ └──────────────┬───────────────────┘ │
│ │ │
│ ┌──────┬──────┴──────┬──────────┐ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ Wait Check- Delayed Batch │
│ point point Run System │
│ System System System │
│ │
│ ┌─────────────────────────────────┐ │
│ │ ExecutionSnapshotSystem │ │
│ │ (durable state tracking) │ │
│ └─────────────────────────────────┘ │
└──────────────────────────────────────┘Run Engine Sub-Systems
| System | Responsibility |
|---|---|
| EnqueueSystem | Accepts new runs, validates concurrency, adds to Redis queue |
| DequeueSystem | Picks the next run using fair-queuing (deficit round-robin) |
| RunAttemptSystem | Tracks each attempt, manages retries with backoff |
| WaitpointSystem | Handles task pauses — waiting for events, approvals, or time delays |
| CheckpointSystem | Saves and restores task state for durable execution |
| DelayedRunSystem | Manages tasks scheduled for a future time |
| DebounceSystem | Collapses rapid triggers into a single execution |
| TtlSystem | Expires runs that exceed their time-to-live |
| BatchSystem | Handles batch triggers (thousands of runs in one call) |
| ExecutionSnapshotSystem | Tracks internal execution state for the v2 engine |
For PMs: What's Fair Queuing?
Imagine a buffet where one person tries to take all the food. Fair queuing (specifically Deficit Round-Robin) ensures every organization gets a proportional share of execution capacity. Even if one customer triggers 10,000 tasks, other customers' tasks still run promptly. This is critical for a multi-tenant platform.
Database Layer (PostgreSQL + Prisma)
PostgreSQL stores all persistent state. The schema has 50+ models managed through Prisma ORM.
Core Entity-Relationship Map
┌──────────────┐
│ User │ ← Auth (GitHub, Google, Magic Link)
└──────┬───────┘
│ member of
┌───────────▼───────────┐
│ Organization │ ← multi-tenancy root
│ (limits, features) │
└───────────┬───────────┘
│ has many
┌───────────▼───────────┐
│ Project │
└───────────┬───────────┘
│ has many
┌────────────────────┼────────────────────┐
│ │ │
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ RuntimeEnvironment│ │ WorkerGroup │ │ TaskSchedule │
│ (Dev/Stg/Prod) │ │ (exec target) │ │ (cron config) │
│ apiKey, pkApiKey │ │ │ │ │
└────────┬────────┘ └────────────────┘ └─────────────────┘
│ has many
┌────────▼────────┐
│ BackgroundWorker │ ← versioned deployment
│ (contentHash, │
│ sdkVersion) │
└────────┬────────┘
│ defines
┌────────▼────────┐
│ BackgroundWorker │
│ Task │ ← slug, queue, retry config
└────────┬────────┘
│ produces
┌────────▼────────┐
│ TaskRun │ ← the central execution record
│ (status, payload│
│ traceId, etc.) │
└────┬────┬───────┘
│ │
┌────▼┐ ┌▼────────────┐
│Attempt│ │ Waitpoint │
│ │ │(pause/resume)│
└──────┘ └──────────────┘Key Model Fields
TaskRun — the central execution record:
| Field | Purpose |
|---|---|
status | PENDING, EXECUTING, COMPLETED_SUCCESSFULLY, CANCELED, SYSTEM_FAILURE, etc. |
payload | JSON input data |
output | JSON result data |
traceId / spanId | OpenTelemetry correlation |
lockedToVersionId | Pin to a specific worker version |
idempotencyKey | Prevent duplicate runs |
ttl | Maximum run lifetime |
queueId | Which queue this run belongs to |
concurrencyKey | For per-key concurrency limiting |
TaskSchedule — cron-based triggers:
| Field | Purpose |
|---|---|
cron | Cron expression (e.g. 0 9 * * *) |
timezone | IANA timezone for cron evaluation |
deduplicationKey | Prevent duplicate schedules |
Queue Architecture (MarQS)
MarQS (Marked Queue System) is Trigger.dev's custom Redis-based queue. It's built on @trigger.dev/redis-worker and implements fair multi-tenant queuing.
Incoming Runs
│
▼
┌─────────────────────────────────────┐
│ MarQS (Redis) │
│ │
│ ┌─────────┐ ┌─────────────────┐ │
│ │ Org A │ │ Org B │ │
│ │ Queue │ │ Queue │ │
│ │ ■■■ │ │ ■■■■■■■■■■ │ │
│ └────┬────┘ └────────┬────────┘ │
│ │ │ │
│ ┌────▼────────────────▼────────┐ │
│ │ Deficit Round-Robin (DRR) │ │
│ │ Fair dequeuing across orgs │ │
│ └──────────────┬───────────────┘ │
│ │ │
│ ┌──────────────▼───────────────┐ │
│ │ Concurrency + Rate Limiter │ │
│ │ Per-queue and per-key │ │
│ └──────────────┬───────────────┘ │
└─────────────────┼───────────────────┘
│
▼
Worker picks up runWhy a Custom Queue?
Standard queues like BullMQ don't support multi-tenant fair scheduling out of the box. MarQS implements Deficit Round-Robin (DRR) — a networking algorithm adapted for task queues — ensuring proportional throughput across all organizations. It also supports per-queue concurrency limits and per-key rate limiting, which are essential for a multi-tenant SaaS.
Worker Orchestration
Development Mode
In development, tasks run inside the developer's local process — no containers needed:
Developer Machine
│
▼
trigger dev (CLI) ← watches for file changes
│
▼
WebSocket to Webapp ← registers tasks, receives runs
│
▼
DevQueueConsumer ← dequeues runs for this dev env
│
▼
Execute in-process ← runs task code directly
│
▼
Report completion ← sends result back via WebSocketProduction Mode
In production, each task run executes in an isolated container:
Webapp (Run Engine)
│ dequeue
▼
Supervisor
│ provisions
▼
┌────────────────────┐
│ Docker Provider │ ← or Kubernetes Provider
│ (Dockerode / │
│ K8s client) │
└────────┬───────────┘
│ creates
┌────▼────────────┐
│ Task Container │
│ ┌────────────┐ │
│ │ Your Code │ │
│ │ + SDK │ │
│ └────────────┘ │
│ Heartbeats ────────► Supervisor
│ Logs ──────────────► OTLP → ClickHouse
│ Completion ────────► Webapp (update DB)
└──────────────────┘Supervisor Responsibilities
| Feature | Description |
|---|---|
| Container lifecycle | Start, monitor, restart, and clean up task containers |
| Resource monitoring | Track CPU and memory usage per container |
| Heartbeat tracking | Detect unresponsive containers and trigger retries |
| Pod cleanup (K8s) | Remove completed or failed Kubernetes pods |
| Failed pod handling | Detect OOMKilled or CrashLoopBackOff and report to run engine |
Real-Time Architecture
The dashboard and worker coordination use multiple real-time channels:
| Channel | Technology | Purpose |
|---|---|---|
| Dashboard sync | ElectricSQL | Real-time run status updates in the UI |
| Worker coordination | Socket.io | Execution payloads, completions, heartbeats |
| Dev worker | WebSocket (/ws) | CLI ↔ platform communication during trigger dev |
| Log streaming | SSE / Socket.io | Live task output in the dashboard |
| Realtime streams | Redis Streams / S2 | SDK-level stream data (e.g. AI token streaming) |
Why ElectricSQL?
ElectricSQL replicates PostgreSQL rows to the browser in real-time. Instead of polling the API every few seconds for run status updates, the dashboard subscribes to the relevant database rows and gets instant updates. This makes the monitoring experience feel like a live terminal.
Authentication & Authorization
Auth Providers
| Method | Use Case |
|---|---|
| GitHub OAuth | Developer login to dashboard |
| Google OAuth | Developer login to dashboard |
| Magic Link | Email-based passwordless login |
| API Keys | Server-to-server task triggering (scoped per environment) |
| Personal Access Tokens | CLI authentication and API access |
| Organization Access Tokens | Org-wide programmatic access |
| Public JWT | Frontend/realtime scoped access (read-only run status) |
API Key Scoping
Each environment (Dev, Staging, Production) has its own API key. A Production key cannot access Development runs, and vice versa. Public keys have further scope restrictions (e.g. read:runs, write:tasks).
Observability Stack
Task Code (SDK)
│ OpenTelemetry spans + logs
▼
OTLP Collector
│
├──► ClickHouse ← High-volume event storage + querying
├──► Prometheus ← Metrics (queue depth, run latency)
└──► Grafana ← Dashboards and alerting
Dashboard
│ TSQL queries
▼
ClickHouse ← Sub-second log search across millions of events