Action Readiness

When a workflow is running, Waymark needs to decide when each node is ready to execute. The answer is one rule, applied uniformly to every node type - actions, barriers, joins, returns:

Every node tracks readiness. A node is only enqueued when all its required predecessors have completed.

That's the entire scheduling model. The corollary - and the reason it scales - is that completions push their effects forward to the immediate downstream nodes; we never rescan the graph to find runnable work.

The core rule

Same shape applied to every node:

  • An action is ready when its predecessors have completed.
  • A join (fan-in) is ready when its required predecessor count is met.
  • A return is ready when its predecessor produces a value.

There's no special-case scheduler for "actions" vs "barriers" vs "loops" - there's one push-based propagation, and the node type only affects what happens after it becomes ready.

Vocabulary

A few terms used throughout the runtime:

  • State machine edges - execution order (who can run after whom).
  • Data-flow edges - variable writes from one node to another.
  • Inline nodes - assignments, expressions, branches, returns. These don't need worker dispatch; they advance in the runloop.
  • Frontier nodes - actions, barriers, outputs. Inline traversal stops at these because they need external coordination.
  • Readiness - determined by predecessor completion status.

Push-based scheduling

When a node completes, the runloop:

  1. Marks the node's completion in the execution graph.
  2. Stores the result and updates variables in the workflow scope.
  3. Evaluates guards on outgoing edges.
  4. For each successor, checks whether all required predecessors are complete.
  5. Adds newly-ready successors to the ready queue.

All of this is in-memory; durable state is batched periodically to Postgres. The runloop never does a global scan to ask "what's runnable right now?" - every completion knows what it unblocks, and only those nodes are touched.

Frontier nodes

Inline traversal stops at three node kinds, because each requires external coordination:

  • Action: dispatched to a Python worker.
  • Barrier: waits for multiple predecessors before firing - spread aggregators, joins of parallel branches.
  • Output: the workflow's terminal node.

Joins with required_count = 1 collapse to inline (no actual barrier needed). Joins with more predecessors become real barriers.

A worked example

Take a fan-out with spread:

items = @fetch_items()
results = spread items:item -> @process_item(item=item)
summary = @summarize(items=results)

The completion flow:

  1. @fetch_items() completes; items lands in the workflow scope.
  2. The spread node creates N action instances, one per item, each tagged with a spread_index.
  3. Each @process_item completion stores its result against its spread_index.
  4. The barrier becomes ready once all N results have arrived.
  5. The barrier aggregates results into an ordered list and writes it to scope.
  6. @summarize becomes ready and receives the aggregated results.

At every step, only the immediate downstream nodes are touched. No scan, no quadratic walk.

Loops and branches

Loops and branches use the same model. They're just nodes and edges, arranged so back-edges and guards do the right thing.

A loop's IR looks roughly like:

fn main(input: [items], output: [results]):
    results = []
    for item in items:
        processed = @process_item(item=item)
        results = results + [processed]
    return results

One iteration flows like this:

  1. loop_init sets the internal index (inline).
  2. loop_cond evaluates the guard and picks continue vs break.
  3. loop_extract assigns item = items[__loop_i] (inline).
  4. @process_item is a frontier action - dispatched to a worker.
  5. On completion, the result lands as processed and results is updated.
  6. The append assignment runs inline; results is now in scope.
  7. loop_incr advances __loop_i; the back-edge routes to loop_cond.
  8. When the guard fails, the break edge routes to loop_exit.

A few specifics worth knowing:

  • A loop head is a branch node with guarded edges.
  • Loop back-edges are marked and do not count toward readiness.
  • Each iteration updates state and resets node status where needed.
  • Branch joins become barriers only when multiple paths can converge.

This keeps loops inside the same push-based model - no separate scheduler mode for iteration.

Why this scales

Push-based scheduling costs O(d) per completion, where d is the number of downstream nodes touched by that completion. The cost is local to the completion, not to the graph size - a workflow with a million completed nodes costs no more per step than a workflow with a hundred. That's the key reason fan-outs of arbitrary width and long-running multi-stage pipelines stay tractable on Postgres.