Configuration

Waymark is configured entirely through environment variables. The runtime reads them directly from process.env - there's no auto-loading of .env files, so set them before launching waymark-start-workers (or import python-dotenv yourself if that's your preference).

The variables are grouped by which process reads them: the worker pool, the bridge, and the Python SDK / boot helpers.

Worker pool

These are the variables you'll touch most often when running waymark-start-workers.

VariableDescriptionDefault
WAYMARK_DATABASE_URLPostgreSQL DSN for runtime state.Required
WAYMARK_WORKER_COUNTNumber of Python worker processes.Host CPU count
WAYMARK_CONCURRENT_PER_WORKERMax concurrent actions per worker.10
WAYMARK_MAX_CONCURRENT_INSTANCESMax in-memory instances across runloop shards.500
WAYMARK_EXECUTOR_SHARDSNumber of executor shards.Host CPU count
WAYMARK_USER_MODULEComma-separated Python modules preloaded in workers.Unset
WAYMARK_MAX_ACTION_LIFECYCLEActions per worker before recycle (memory mitigation).Unset (no limit)
WAYMARK_WEBAPP_ENABLEDEnable embedded web UI.false
WAYMARK_WEBAPP_ADDRWeb UI bind address.0.0.0.0:24119

WAYMARK_USER_MODULE is the one most people miss on first setup. The worker pool needs to import the modules where your @action and @workflow decorators run, otherwise it has no handlers registered when queue rows arrive.

Worker pool advanced

These tune the runloop scheduler. The defaults are well-chosen for a single-host deployment; you usually only reach for them once you're running multiple hosts and have a measured reason.

VariableDescriptionDefault
WAYMARK_WORKER_GRPC_ADDRgRPC bind for the Python worker bridge server.127.0.0.1:24118
WAYMARK_POLL_INTERVAL_MSQueue poll interval.100
WAYMARK_INSTANCE_DONE_BATCH_SIZEBatch size for persisting completed instances.Falls back to WAYMARK_MAX_CONCURRENT_INSTANCES
WAYMARK_PERSIST_INTERVAL_MSPersistence flush interval.500
WAYMARK_LOCK_TTL_MSQueue lock TTL.15000
WAYMARK_LOCK_HEARTBEAT_MSQueue lock heartbeat interval.5000
WAYMARK_EVICT_SLEEP_THRESHOLD_MSSleep threshold for evicting idle instances.10000
WAYMARK_EXPIRED_LOCK_RECLAIMER_INTERVAL_MSExpired lock reclaim sweep.15000
WAYMARK_EXPIRED_LOCK_RECLAIMER_BATCH_SIZEMax locks reclaimed per sweep.1000
WAYMARK_SCHEDULER_POLL_INTERVAL_MSScheduler poll interval.1000
WAYMARK_SCHEDULER_BATCH_SIZEScheduler due-item batch size.100
WAYMARK_RUNNER_PROFILE_INTERVAL_MSWorker status / profile publish interval.5000

Bridge

waymark-bridge is the gRPC server that translates between the Python SDK and the Rust runtime. Most users never run this binary directly - the Python SDK boots a singleton bridge automatically on first workflow invocation. You'd set these only when you want to run the bridge as its own service (e.g., a sidecar container).

VariableDescriptionDefault
WAYMARK_BRIDGE_GRPC_ADDRgRPC bind address.127.0.0.1:24117
WAYMARK_BRIDGE_IN_MEMORYRun with no Postgres backend (test / dev only).false
WAYMARK_DATABASE_URLPostgreSQL DSN.Required unless in-memory

Python SDK & bootstrap

These influence how the Python SDK connects to a bridge - useful when you've split processes across containers or hosts.

VariableDescriptionDefault
WAYMARK_BOOT_COMMANDFull command to boot the singleton bridge.Unset
WAYMARK_BOOT_BINARYBoot binary used when WAYMARK_BOOT_COMMAND is unset.waymark-boot-singleton
WAYMARK_BRIDGE_GRPC_ADDRExplicit host:port for the SDK to connect to.Unset
WAYMARK_BRIDGE_GRPC_HOSTBridge gRPC host (used by the singleton helper and the SDK).127.0.0.1
WAYMARK_BRIDGE_GRPC_PORTBridge gRPC base port.24117
WAYMARK_BRIDGE_BASE_PORTFallback alias for WAYMARK_BRIDGE_GRPC_PORT.Unset
WAYMARK_SKIP_WAIT_FOR_INSTANCEReturn immediately after queueing instead of awaiting completion.false
WAYMARK_LOG_LEVELPython SDK logger level.INFO

Worker recycling

WAYMARK_MAX_ACTION_LIFECYCLE controls how many actions a Python worker can execute before it's automatically replaced. Useful when a third-party library leaks memory: set the variable to something like 1000 or 10000 and Waymark will spin up a fresh worker before retiring the old one. In-flight actions on the old worker complete normally before the process exits - there's no zero-downtime cost from recycling.

By default this is unset, meaning workers run indefinitely.