Learn from OSS
Trigger.dev

Trigger.dev Architecture Overview

Trigger.dev's system architecture — run engine, queue system, worker orchestration, database schema, and real-time coordination

Trigger.dev Architecture Overview

For Product Managers

This page explains how Trigger.dev is structured as a software system. Focus on the diagrams and "Why It Matters" callouts to understand how tasks flow from your code to execution and back. You don't need to understand every technical detail.

Trigger.dev uses a multi-service architecture where a central web application accepts task triggers, a run engine manages queueing and retries, and supervisors launch isolated task containers. This page explains how those services coordinate so background jobs remain durable, fair, and observable.

How Trigger.dev Architecture Works

Trigger.dev follows a multi-service architecture where the webapp is the central hub, coordinating between the SDK, workers, queues, and storage:

┌─────────────────────────────────────────────────────────────────────┐
│                    YOUR APPLICATION (Node.js / Next.js)              │
│                    @trigger.dev/sdk  →  trigger(myTask, payload)     │
└──────────────────────────────┬──────────────────────────────────────┘
                               │ HTTPS (REST API)
                    ┌──────────▼──────────┐
                    │   Webapp (Remix)     │  ← Dashboard + API
                    │   Express :3030      │    REST v1/v2/v3
                    │   Socket.io          │    Real-time coordination
                    └──┬──────┬───────┬───┘
                       │      │       │
       ┌───────────────▼─┐ ┌──▼──────────┐ ┌▼───────────────┐
       │  PostgreSQL      │ │   Redis     │ │  ClickHouse    │
       │  (Primary DB)    │ │  (Queue +   │ │  (Analytics +  │
       │  Prisma ORM      │ │   Locks)    │ │   Logs)        │
       └─────────────────┘ └──┬──────────┘ └────────────────┘

                    ┌──────────▼──────────┐
                    │   Run Engine        │  ← Orchestrates execution
                    │   MarQS Fair Queue  │    Enqueue → Dequeue → Track
                    │   Redis Worker      │
                    └──────────┬──────────┘
                               │ Dequeue
                    ┌──────────▼──────────┐
                    │   Supervisor        │  ← Workload manager
                    │   (Docker or K8s)   │    Starts/stops containers
                    └──────────┬──────────┘
                               │ Spawn container
                    ┌──────────▼──────────┐
                    │   Task Container    │  ← Your task code runs here
                    │   (isolated runtime)│    Heartbeats, logs, completion
                    └─────────────────────┘

Why It Matters

Every task trigger flows through the Webapp API → Run Engine (Redis queue) → Supervisor → Container. This separation means the API responds instantly (the task is enqueued), while heavy computation runs in isolated containers. If a container crashes, the Run Engine knows and can retry automatically. The queue is multi-tenant and fair, so one user's spike in tasks won't starve other users.

The Five Applications

AppTechnologyPurpose
webappRemix 2 + Express + React 18Dashboard UI, REST API (v1/v2/v3), WebSocket coordination, run engine host
supervisorNode.js + Docker/K8s clientsManages container lifecycle — starts, monitors, and cleans up task containers
coordinatorNode.js + Socket.ioCoordinates worker instances, relays execution payloads and completions
docker-providerNode.js + DockerodeDocker-specific container management for self-hosted deployments
kubernetes-providerNode.js + @kubernetes/client-nodeKubernetes-specific pod management for cloud and enterprise deployments

The Run Engine

The Run Engine is the heart of Trigger.dev — it orchestrates every task from trigger to completion.

                  ┌──────────────────────────────────────┐
                  │            RUN ENGINE                  │
                  │                                        │
                  │  ┌──────────────┐  ┌───────────────┐  │
                  │  │ EnqueueSystem│  │ DequeueSystem │  │
                  │  │ (accept run) │  │ (pick next)   │  │
                  │  └──────┬───────┘  └───────┬───────┘  │
                  │         │                  │           │
                  │  ┌──────▼──────────────────▼───────┐  │
                  │  │          RunQueue (Redis)         │  │
                  │  │    Fair multi-tenant (MarQS)      │  │
                  │  │    Concurrency + Rate limits      │  │
                  │  └──────────────┬───────────────────┘  │
                  │                 │                       │
                  │  ┌──────────────▼───────────────────┐  │
                  │  │      RunAttemptSystem             │  │
                  │  │  (track attempts + retries)       │  │
                  │  └──────────────┬───────────────────┘  │
                  │                 │                       │
                  │  ┌──────┬──────┴──────┬──────────┐    │
                  │  │      │             │          │    │
                  │  ▼      ▼             ▼          ▼    │
                  │ Wait  Check-      Delayed     Batch   │
                  │ point  point      Run         System  │
                  │ System System     System              │
                  │                                        │
                  │  ┌─────────────────────────────────┐  │
                  │  │   ExecutionSnapshotSystem        │  │
                  │  │   (durable state tracking)       │  │
                  │  └─────────────────────────────────┘  │
                  └──────────────────────────────────────┘

Run Engine Sub-Systems

SystemResponsibility
EnqueueSystemAccepts new runs, validates concurrency, adds to Redis queue
DequeueSystemPicks the next run using fair-queuing (deficit round-robin)
RunAttemptSystemTracks each attempt, manages retries with backoff
WaitpointSystemHandles task pauses — waiting for events, approvals, or time delays
CheckpointSystemSaves and restores task state for durable execution
DelayedRunSystemManages tasks scheduled for a future time
DebounceSystemCollapses rapid triggers into a single execution
TtlSystemExpires runs that exceed their time-to-live
BatchSystemHandles batch triggers (thousands of runs in one call)
ExecutionSnapshotSystemTracks internal execution state for the v2 engine

For PMs: What's Fair Queuing?

Imagine a buffet where one person tries to take all the food. Fair queuing (specifically Deficit Round-Robin) ensures every organization gets a proportional share of execution capacity. Even if one customer triggers 10,000 tasks, other customers' tasks still run promptly. This is critical for a multi-tenant platform.

Database Layer (PostgreSQL + Prisma)

PostgreSQL stores all persistent state. The schema has 50+ models managed through Prisma ORM.

Core Entity-Relationship Map

                         ┌──────────────┐
                         │     User     │ ← Auth (GitHub, Google, Magic Link)
                         └──────┬───────┘
                                │ member of
                    ┌───────────▼───────────┐
                    │    Organization       │ ← multi-tenancy root
                    │  (limits, features)   │
                    └───────────┬───────────┘
                                │ has many
                    ┌───────────▼───────────┐
                    │       Project         │
                    └───────────┬───────────┘
                                │ has many
           ┌────────────────────┼────────────────────┐
           │                    │                    │
  ┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
  │ RuntimeEnvironment│ │  WorkerGroup   │ │  TaskSchedule   │
  │ (Dev/Stg/Prod)  │ │ (exec target)  │ │  (cron config)  │
  │ apiKey, pkApiKey │ │                │ │                 │
  └────────┬────────┘ └────────────────┘ └─────────────────┘
           │ has many
  ┌────────▼────────┐
  │ BackgroundWorker │ ← versioned deployment
  │ (contentHash,   │
  │  sdkVersion)    │
  └────────┬────────┘
           │ defines
  ┌────────▼────────┐
  │ BackgroundWorker │
  │    Task          │ ← slug, queue, retry config
  └────────┬────────┘
           │ produces
  ┌────────▼────────┐
  │    TaskRun      │ ← the central execution record
  │ (status, payload│
  │  traceId, etc.) │
  └────┬────┬───────┘
       │    │
  ┌────▼┐  ┌▼────────────┐
  │Attempt│ │  Waitpoint  │
  │      │ │(pause/resume)│
  └──────┘ └──────────────┘

Key Model Fields

TaskRun — the central execution record:

FieldPurpose
statusPENDING, EXECUTING, COMPLETED_SUCCESSFULLY, CANCELED, SYSTEM_FAILURE, etc.
payloadJSON input data
outputJSON result data
traceId / spanIdOpenTelemetry correlation
lockedToVersionIdPin to a specific worker version
idempotencyKeyPrevent duplicate runs
ttlMaximum run lifetime
queueIdWhich queue this run belongs to
concurrencyKeyFor per-key concurrency limiting

TaskSchedule — cron-based triggers:

FieldPurpose
cronCron expression (e.g. 0 9 * * *)
timezoneIANA timezone for cron evaluation
deduplicationKeyPrevent duplicate schedules

Queue Architecture (MarQS)

MarQS (Marked Queue System) is Trigger.dev's custom Redis-based queue. It's built on @trigger.dev/redis-worker and implements fair multi-tenant queuing.

Incoming Runs


┌─────────────────────────────────────┐
│         MarQS (Redis)               │
│                                     │
│  ┌─────────┐  ┌─────────────────┐  │
│  │ Org A   │  │   Org B         │  │
│  │ Queue   │  │   Queue         │  │
│  │ ■■■     │  │   ■■■■■■■■■■   │  │
│  └────┬────┘  └────────┬────────┘  │
│       │                │           │
│  ┌────▼────────────────▼────────┐  │
│  │  Deficit Round-Robin (DRR)   │  │
│  │  Fair dequeuing across orgs  │  │
│  └──────────────┬───────────────┘  │
│                 │                   │
│  ┌──────────────▼───────────────┐  │
│  │   Concurrency + Rate Limiter │  │
│  │   Per-queue and per-key      │  │
│  └──────────────┬───────────────┘  │
└─────────────────┼───────────────────┘


           Worker picks up run

Why a Custom Queue?

Standard queues like BullMQ don't support multi-tenant fair scheduling out of the box. MarQS implements Deficit Round-Robin (DRR) — a networking algorithm adapted for task queues — ensuring proportional throughput across all organizations. It also supports per-queue concurrency limits and per-key rate limiting, which are essential for a multi-tenant SaaS.

Worker Orchestration

Development Mode

In development, tasks run inside the developer's local process — no containers needed:

Developer Machine


trigger dev (CLI)        ← watches for file changes


WebSocket to Webapp      ← registers tasks, receives runs


DevQueueConsumer         ← dequeues runs for this dev env


Execute in-process       ← runs task code directly


Report completion        ← sends result back via WebSocket

Production Mode

In production, each task run executes in an isolated container:

Webapp (Run Engine)
    │ dequeue

Supervisor
    │ provisions

┌────────────────────┐
│  Docker Provider   │  ← or Kubernetes Provider
│  (Dockerode /      │
│   K8s client)      │
└────────┬───────────┘
         │ creates
    ┌────▼────────────┐
    │  Task Container  │
    │  ┌────────────┐  │
    │  │ Your Code   │ │
    │  │ + SDK       │ │
    │  └────────────┘  │
    │  Heartbeats ────────► Supervisor
    │  Logs ──────────────► OTLP → ClickHouse
    │  Completion ────────► Webapp (update DB)
    └──────────────────┘

Supervisor Responsibilities

FeatureDescription
Container lifecycleStart, monitor, restart, and clean up task containers
Resource monitoringTrack CPU and memory usage per container
Heartbeat trackingDetect unresponsive containers and trigger retries
Pod cleanup (K8s)Remove completed or failed Kubernetes pods
Failed pod handlingDetect OOMKilled or CrashLoopBackOff and report to run engine

Real-Time Architecture

The dashboard and worker coordination use multiple real-time channels:

ChannelTechnologyPurpose
Dashboard syncElectricSQLReal-time run status updates in the UI
Worker coordinationSocket.ioExecution payloads, completions, heartbeats
Dev workerWebSocket (/ws)CLI ↔ platform communication during trigger dev
Log streamingSSE / Socket.ioLive task output in the dashboard
Realtime streamsRedis Streams / S2SDK-level stream data (e.g. AI token streaming)

Why ElectricSQL?

ElectricSQL replicates PostgreSQL rows to the browser in real-time. Instead of polling the API every few seconds for run status updates, the dashboard subscribes to the relevant database rows and gets instant updates. This makes the monitoring experience feel like a live terminal.

Authentication & Authorization

Auth Providers

MethodUse Case
GitHub OAuthDeveloper login to dashboard
Google OAuthDeveloper login to dashboard
Magic LinkEmail-based passwordless login
API KeysServer-to-server task triggering (scoped per environment)
Personal Access TokensCLI authentication and API access
Organization Access TokensOrg-wide programmatic access
Public JWTFrontend/realtime scoped access (read-only run status)

API Key Scoping

Each environment (Dev, Staging, Production) has its own API key. A Production key cannot access Development runs, and vice versa. Public keys have further scope restrictions (e.g. read:runs, write:tasks).

Observability Stack

Task Code (SDK)
    │ OpenTelemetry spans + logs

OTLP Collector

    ├──► ClickHouse        ← High-volume event storage + querying
    ├──► Prometheus         ← Metrics (queue depth, run latency)
    └──► Grafana            ← Dashboards and alerting

Dashboard
    │ TSQL queries

ClickHouse               ← Sub-second log search across millions of events

What's Next