Trigger.dev's system architecture — run engine, queue system, worker orchestration, database schema, and real-time coordination

Trigger.dev Architecture Overview

For Product Managers

This page explains how Trigger.dev is structured as a software system. Focus on the diagrams and "Why It Matters" callouts to understand how tasks flow from your code to execution and back. You don't need to understand every technical detail.

Trigger.dev uses a multi-service architecture where a central web application accepts task triggers, a run engine manages queueing and retries, and supervisors launch isolated task containers. This page explains how those services coordinate so background jobs remain durable, fair, and observable.

How Trigger.dev Architecture Works

Trigger.dev follows a multi-service architecture where the webapp is the central hub, coordinating between the SDK, workers, queues, and storage:

┌─────────────────────────────────────────────────────────────────────┐
│                    YOUR APPLICATION (Node.js / Next.js)              │
│                    @trigger.dev/sdk  →  trigger(myTask, payload)     │
└──────────────────────────────┬──────────────────────────────────────┘
                               │ HTTPS (REST API)
                    ┌──────────▼──────────┐
                    │   Webapp (Remix)     │  ← Dashboard + API
                    │   Express :3030      │    REST v1/v2/v3
                    │   Socket.io          │    Real-time coordination
                    └──┬──────┬───────┬───┘
                       │      │       │
       ┌───────────────▼─┐ ┌──▼──────────┐ ┌▼───────────────┐
       │  PostgreSQL      │ │   Redis     │ │  ClickHouse    │
       │  (Primary DB)    │ │  (Queue +   │ │  (Analytics +  │
       │  Prisma ORM      │ │   Locks)    │ │   Logs)        │
       └─────────────────┘ └──┬──────────┘ └────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │   Run Engine        │  ← Orchestrates execution
                    │   MarQS Fair Queue  │    Enqueue → Dequeue → Track
                    │   Redis Worker      │
                    └──────────┬──────────┘
                               │ Dequeue
                    ┌──────────▼──────────┐
                    │   Supervisor        │  ← Workload manager
                    │   (Docker or K8s)   │    Starts/stops containers
                    └──────────┬──────────┘
                               │ Spawn container
                    ┌──────────▼──────────┐
                    │   Task Container    │  ← Your task code runs here
                    │   (isolated runtime)│    Heartbeats, logs, completion
                    └─────────────────────┘

Why It Matters

Every task trigger flows through the Webapp API → Run Engine (Redis queue) → Supervisor → Container. This separation means the API responds instantly (the task is enqueued), while heavy computation runs in isolated containers. If a container crashes, the Run Engine knows and can retry automatically. The queue is multi-tenant and fair, so one user's spike in tasks won't starve other users.

The Five Applications

App	Technology	Purpose
webapp	Remix 2 + Express + React 18	Dashboard UI, REST API (v1/v2/v3), WebSocket coordination, run engine host
supervisor	Node.js + Docker/K8s clients	Manages container lifecycle — starts, monitors, and cleans up task containers
coordinator	Node.js + Socket.io	Coordinates worker instances, relays execution payloads and completions
docker-provider	Node.js + Dockerode	Docker-specific container management for self-hosted deployments
kubernetes-provider	Node.js + @kubernetes/client-node	Kubernetes-specific pod management for cloud and enterprise deployments

The Run Engine

The Run Engine is the heart of Trigger.dev — it orchestrates every task from trigger to completion.

                  ┌──────────────────────────────────────┐
                  │            RUN ENGINE                  │
                  │                                        │
                  │  ┌──────────────┐  ┌───────────────┐  │
                  │  │ EnqueueSystem│  │ DequeueSystem │  │
                  │  │ (accept run) │  │ (pick next)   │  │
                  │  └──────┬───────┘  └───────┬───────┘  │
                  │         │                  │           │
                  │  ┌──────▼──────────────────▼───────┐  │
                  │  │          RunQueue (Redis)         │  │
                  │  │    Fair multi-tenant (MarQS)      │  │
                  │  │    Concurrency + Rate limits      │  │
                  │  └──────────────┬───────────────────┘  │
                  │                 │                       │
                  │  ┌──────────────▼───────────────────┐  │
                  │  │      RunAttemptSystem             │  │
                  │  │  (track attempts + retries)       │  │
                  │  └──────────────┬───────────────────┘  │
                  │                 │                       │
                  │  ┌──────┬──────┴──────┬──────────┐    │
                  │  │      │             │          │    │
                  │  ▼      ▼             ▼          ▼    │
                  │ Wait  Check-      Delayed     Batch   │
                  │ point  point      Run         System  │
                  │ System System     System              │
                  │                                        │
                  │  ┌─────────────────────────────────┐  │
                  │  │   ExecutionSnapshotSystem        │  │
                  │  │   (durable state tracking)       │  │
                  │  └─────────────────────────────────┘  │
                  └──────────────────────────────────────┘

Run Engine Sub-Systems

System	Responsibility
EnqueueSystem	Accepts new runs, validates concurrency, adds to Redis queue
DequeueSystem	Picks the next run using fair-queuing (deficit round-robin)
RunAttemptSystem	Tracks each attempt, manages retries with backoff
WaitpointSystem	Handles task pauses — waiting for events, approvals, or time delays
CheckpointSystem	Saves and restores task state for durable execution
DelayedRunSystem	Manages tasks scheduled for a future time
DebounceSystem	Collapses rapid triggers into a single execution
TtlSystem	Expires runs that exceed their time-to-live
BatchSystem	Handles batch triggers (thousands of runs in one call)
ExecutionSnapshotSystem	Tracks internal execution state for the v2 engine

For PMs: What's Fair Queuing?

Imagine a buffet where one person tries to take all the food. Fair queuing (specifically Deficit Round-Robin) ensures every organization gets a proportional share of execution capacity. Even if one customer triggers 10,000 tasks, other customers' tasks still run promptly. This is critical for a multi-tenant platform.

Database Layer (PostgreSQL + Prisma)

PostgreSQL stores all persistent state. The schema has 50+ models managed through Prisma ORM.

Core Entity-Relationship Map

                         ┌──────────────┐
                         │     User     │ ← Auth (GitHub, Google, Magic Link)
                         └──────┬───────┘
                                │ member of
                    ┌───────────▼───────────┐
                    │    Organization       │ ← multi-tenancy root
                    │  (limits, features)   │
                    └───────────┬───────────┘
                                │ has many
                    ┌───────────▼───────────┐
                    │       Project         │
                    └───────────┬───────────┘
                                │ has many
           ┌────────────────────┼────────────────────┐
           │                    │                    │
  ┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
  │ RuntimeEnvironment│ │  WorkerGroup   │ │  TaskSchedule   │
  │ (Dev/Stg/Prod)  │ │ (exec target)  │ │  (cron config)  │
  │ apiKey, pkApiKey │ │                │ │                 │
  └────────┬────────┘ └────────────────┘ └─────────────────┘
           │ has many
  ┌────────▼────────┐
  │ BackgroundWorker │ ← versioned deployment
  │ (contentHash,   │
  │  sdkVersion)    │
  └────────┬────────┘
           │ defines
  ┌────────▼────────┐
  │ BackgroundWorker │
  │    Task          │ ← slug, queue, retry config
  └────────┬────────┘
           │ produces
  ┌────────▼────────┐
  │    TaskRun      │ ← the central execution record
  │ (status, payload│
  │  traceId, etc.) │
  └────┬────┬───────┘
       │    │
  ┌────▼┐  ┌▼────────────┐
  │Attempt│ │  Waitpoint  │
  │      │ │(pause/resume)│
  └──────┘ └──────────────┘

Key Model Fields

TaskRun — the central execution record:

Field	Purpose
`status`	PENDING, EXECUTING, COMPLETED_SUCCESSFULLY, CANCELED, SYSTEM_FAILURE, etc.
`payload`	JSON input data
`output`	JSON result data
`traceId` / `spanId`	OpenTelemetry correlation
`lockedToVersionId`	Pin to a specific worker version
`idempotencyKey`	Prevent duplicate runs
`ttl`	Maximum run lifetime
`queueId`	Which queue this run belongs to
`concurrencyKey`	For per-key concurrency limiting

TaskSchedule — cron-based triggers:

Field	Purpose
`cron`	Cron expression (e.g. `0 9 * * *`)
`timezone`	IANA timezone for cron evaluation
`deduplicationKey`	Prevent duplicate schedules

Queue Architecture (MarQS)

MarQS (Marked Queue System) is Trigger.dev's custom Redis-based queue. It's built on @trigger.dev/redis-worker and implements fair multi-tenant queuing.

Incoming Runs
    │
    ▼
┌─────────────────────────────────────┐
│         MarQS (Redis)               │
│                                     │
│  ┌─────────┐  ┌─────────────────┐  │
│  │ Org A   │  │   Org B         │  │
│  │ Queue   │  │   Queue         │  │
│  │ ■■■     │  │   ■■■■■■■■■■   │  │
│  └────┬────┘  └────────┬────────┘  │
│       │                │           │
│  ┌────▼────────────────▼────────┐  │
│  │  Deficit Round-Robin (DRR)   │  │
│  │  Fair dequeuing across orgs  │  │
│  └──────────────┬───────────────┘  │
│                 │                   │
│  ┌──────────────▼───────────────┐  │
│  │   Concurrency + Rate Limiter │  │
│  │   Per-queue and per-key      │  │
│  └──────────────┬───────────────┘  │
└─────────────────┼───────────────────┘
                  │
                  ▼
           Worker picks up run

Why a Custom Queue?

Standard queues like BullMQ don't support multi-tenant fair scheduling out of the box. MarQS implements Deficit Round-Robin (DRR) — a networking algorithm adapted for task queues — ensuring proportional throughput across all organizations. It also supports per-queue concurrency limits and per-key rate limiting, which are essential for a multi-tenant SaaS.

Worker Orchestration

Development Mode

In development, tasks run inside the developer's local process — no containers needed:

Developer Machine
    │
    ▼
trigger dev (CLI)        ← watches for file changes
    │
    ▼
WebSocket to Webapp      ← registers tasks, receives runs
    │
    ▼
DevQueueConsumer         ← dequeues runs for this dev env
    │
    ▼
Execute in-process       ← runs task code directly
    │
    ▼
Report completion        ← sends result back via WebSocket

Production Mode

In production, each task run executes in an isolated container:

Webapp (Run Engine)
    │ dequeue
    ▼
Supervisor
    │ provisions
    ▼
┌────────────────────┐
│  Docker Provider   │  ← or Kubernetes Provider
│  (Dockerode /      │
│   K8s client)      │
└────────┬───────────┘
         │ creates
    ┌────▼────────────┐
    │  Task Container  │
    │  ┌────────────┐  │
    │  │ Your Code   │ │
    │  │ + SDK       │ │
    │  └────────────┘  │
    │  Heartbeats ────────► Supervisor
    │  Logs ──────────────► OTLP → ClickHouse
    │  Completion ────────► Webapp (update DB)
    └──────────────────┘

Supervisor Responsibilities

Feature	Description
Container lifecycle	Start, monitor, restart, and clean up task containers
Resource monitoring	Track CPU and memory usage per container
Heartbeat tracking	Detect unresponsive containers and trigger retries
Pod cleanup (K8s)	Remove completed or failed Kubernetes pods
Failed pod handling	Detect OOMKilled or CrashLoopBackOff and report to run engine

Real-Time Architecture

The dashboard and worker coordination use multiple real-time channels:

Channel	Technology	Purpose
Dashboard sync	ElectricSQL	Real-time run status updates in the UI
Worker coordination	Socket.io	Execution payloads, completions, heartbeats
Dev worker	WebSocket (`/ws`)	CLI ↔ platform communication during `trigger dev`
Log streaming	SSE / Socket.io	Live task output in the dashboard
Realtime streams	Redis Streams / S2	SDK-level stream data (e.g. AI token streaming)

Why ElectricSQL?

ElectricSQL replicates PostgreSQL rows to the browser in real-time. Instead of polling the API every few seconds for run status updates, the dashboard subscribes to the relevant database rows and gets instant updates. This makes the monitoring experience feel like a live terminal.

Authentication & Authorization

Auth Providers

Method	Use Case
GitHub OAuth	Developer login to dashboard
Google OAuth	Developer login to dashboard
Magic Link	Email-based passwordless login
API Keys	Server-to-server task triggering (scoped per environment)
Personal Access Tokens	CLI authentication and API access
Organization Access Tokens	Org-wide programmatic access
Public JWT	Frontend/realtime scoped access (read-only run status)

API Key Scoping

Each environment (Dev, Staging, Production) has its own API key. A Production key cannot access Development runs, and vice versa. Public keys have further scope restrictions (e.g. read:runs, write:tasks).

Observability Stack

Task Code (SDK)
    │ OpenTelemetry spans + logs
    ▼
OTLP Collector
    │
    ├──► ClickHouse        ← High-volume event storage + querying
    ├──► Prometheus         ← Metrics (queue depth, run latency)
    └──► Grafana            ← Dashboards and alerting

Dashboard
    │ TSQL queries
    ▼
ClickHouse               ← Sub-second log search across millions of events

Trigger.dev Architecture Overview

Trigger.dev Architecture Overview

How Trigger.dev Architecture Works

The Five Applications

The Run Engine

Run Engine Sub-Systems

Database Layer (PostgreSQL + Prisma)

Core Entity-Relationship Map

Key Model Fields

Queue Architecture (MarQS)

Worker Orchestration

Development Mode

Production Mode

Supervisor Responsibilities

Real-Time Architecture

Authentication & Authorization

Auth Providers

API Key Scoping

Observability Stack

What's Next

What Is Trigger.dev?

Trigger.dev Workflows and Data Flows

How Trigger.dev Is Built

On this page