Why Waymark
Background jobs in webapps are common enough that they deserve to be a primitive of your full-stack toolkit - sitting next to the database, the backend, and the frontend. Without that, you're stuck either making blocking requests to your API or spinning up ephemeral tasks that die during a re-deploy or an unlucky container crash.
After a few years of trying most of the ecosystem, I came away with a short list of things a background-job library should provide:
- Control flow you write as ordinary Python.
- The same execution path locally as in production.
- Reasonable defaults that get you to product-market fit before you have to performance-tune anything.
Nothing on the market quite balances these. Waymark is the attempt.
The pitch
A workflow is the durable control flow. An action is a unit of distributed
work. You write both as plain async def Python:
import asyncio
from waymark import Workflow, action, workflow
@action
async def fetch_users(user_ids: list[str]) -> list[User]:
...
@action
async def send_email(to: str, subject: str) -> EmailResult:
...
@workflow
class WelcomeEmailWorkflow(Workflow):
async def run(self, user_ids: list[str]) -> list[EmailResult]:
users = await fetch_users(user_ids)
active_users = [user for user in users if user.active]
return await asyncio.gather(
*[send_email(to=user.email, subject="Welcome") for user in active_users],
return_exceptions=True,
)
When the @workflow decorator runs, Waymark parses the AST of run(). The
for becomes a filter node. The asyncio.gather becomes a parallel fan-out.
The await fetch_users(...) becomes an action node. The compiled DAG is
stored in Postgres. From that point on, your authored body never executes
again - the Rust runtime walks the DAG, dispatches actions to workers, and
records progress.
Compile-once vs replay
Replay-based engines like Temporal and Vercel Workflows treat your workflow
function as the source of truth. To make recovery work, they re-run your code
from the top on each step, returning cached results for already-completed
activities. The price is determinism: no random(), no datetime.now(), no
side effects in the workflow body. Get any of that wrong and your bug shows
up at recovery time, not at registration time.
Waymark inverts the tradeoff. Your code runs once - at registration - to
produce the DAG. The DAG is what executes. There's no replay, so there's no
determinism constraint. Instead, the constraint becomes "use supported
patterns," and the compiler tells you about violations up front. Want
non-determinism? That's exactly what an @action is for.
The practical consequence: if your workflow compiles, it runs as advertised.
You don't have to remember which stdlib calls are safe inside run().
When to reach for Waymark
- You're already on Python and Postgres (Mountaineer, FastAPI, Django, Flask
- doesn't matter).
- You have async-heavy code that needs to be durable and retryable: third-party API calls, slow database jobs, fan-outs.
- You want local and production behavior to match.
- You want background-job code to slot into your existing unit-test and static-analysis pipeline.
- You're focused on getting to product-market fit, not on the next 10×.
Performance is a priority - Waymark has a Rust core, ~1 connection pool per host, and continuous benchmarks on CI - but it isn't the only priority. Postgres is an excellent backing store for ACID workflow state up to a real scale. Once you're stressing Postgres' write capacity, you're in territory where a more specialized system is worth the operational cost.
When not to
- You have latency-sensitive jobs that need sub-100 ms acknowledgement and dispatch.
- You're coordinating tens of thousands of concurrent actions or more.
- You've already outgrown another task coordinator and you need the next 10× of headroom.
Open-source brokers like RabbitMQ have decades of battle-testing, and hosted products like Temporal throw real engineering at the optimization problem. Both are great choices for those scopes. They also bring real setup, operational, and per-event-billing costs. Waymark is a different bet: keep the operational surface to "Postgres plus a worker process" and aim that at the long tail of applications that never need anything more than that.
Status
Waymark is in early alpha. The runtime spec is changing quickly and we don't
guarantee backward compatibility before 1.0.0. If you hit a workflow that you
think should compile but doesn't (you can visualize the DAG with
.visualize()), please file an issue.