DAG Conversion

After the AST compiler produces Waymark IR, a separate conversion pass turns that IR into the graph the runner actually walks. It's a DAG for most workflows, with a single back-edge for loops. This pass is where control flow and data flow become explicit - every conditional, loop, and try/except becomes a concrete arrangement of nodes and edges.

Mental model

Two coordinates per node:

  • State machine edges encode "what can run after this completes."
  • Data-flow edges encode "which variable values should be written into a node's inbox."

A node becomes runnable when its state-machine predecessors have all completed and its required inbox values are present. The Action Readiness page covers the readiness rules in depth.

Nodes themselves are steps: action calls, assignments, branches, joins, aggregators, and the function boundary nodes that mark where a function begins and ends.

Two-phase conversion

The conversion is two passes:

  1. Per-function subgraphs. Each function becomes an isolated subgraph with explicit input and output boundary nodes. Calls inside the function become fn_call nodes that capture their kwargs but don't yet point at the callee's graph.
  2. Expansion and global wiring. Starting from the entry function (main if present, otherwise the first non-internal function), the pass inlines helpers, remaps exception edges across function boundaries, and recomputes the global data-flow edges so values defined inside a helper can reach downstream nodes in the caller.

Validation is a final pass - no dangling edges, no invalid loop wiring, no stray output edges - before the DAG is handed to the runtime.

Node types

NodeWhat it is
input / outputFunction boundaries.
action_callDelegated work dispatched to Python workers.
assignment / expression / returnInline work executed in the runner itself.
branchDecision point for if / elif / else and loop conditions.
joinMerge point. Frequently required_count = 1, which collapses to inline.
parallelEntry node for a parallel block.
aggregatorBarrier that waits for spread / parallel results.
fn_callPlaceholder before expansion (external calls may stay as fn_call).

Edge types

State machine edges carry execution-order metadata:

  • guard_expr for branch and loop conditions.
  • condition labels like success or else.
  • exception_types for try/except routing.
  • is_loop_back so back-edges don't count toward readiness.

Data-flow edges are per-variable. (src, dst, var) means "write var from src into dst's inbox." The conversion avoids stale data by only wiring from the most recent definition along the execution order. Join nodes don't define values; they only mark where paths converge.

How each construct lowers

Straight-line code. Assignments and expressions become inline nodes. Action calls become action_call nodes. State-machine edges preserve order; data-flow edges carry only the variables used downstream.

Function boundaries and returns. Every function has explicit input and output nodes. All return statements connect to the output boundary, so early returns terminate the function correctly. During expansion, helper functions are inlined and their nodes are prefixed; the helper's input nodes are stripped, with the caller supplying inputs via data-flow.

Conditionals. An if / elif / else produces a branch node that fans out to guarded edges plus an else edge for the default arm. When at least one branch can continue past the conditional, a join node with required_count = 1 re-merges the paths so the next step sees a single point of arrival.

Try / except. Try bodies are flattened. Every node inside the try body can emit exception edges to the handler - those edges carry the matching exception types as metadata. Success edges flow to a join node. If the handler binds an exception variable (except E as err:), the conversion inserts an assignment from __waymark_exception__ before the handler body so err is in scope.

For / while loops. Loops expand into a small state machine:

  • for loops produce loop_init, loop_cond (branch), loop_extract, the body, loop_incr, and a loop_exit join.
  • while loops produce loop_cond, the body, loop_continue for continue wiring, and loop_exit.

Back-edges from the body to loop_cond are marked with is_loop_back so the readiness check ignores them.

Spread and parallel. A spread becomes a spread action node plus an aggregator. Each action result is written with a spread_index, and the aggregator reads them and emits an ordered list. A parallel block creates a parallel entry node, one node per call, and an aggregator that waits for all results.

Visualizing the graph

Every workflow has a .visualize() method that renders the graph as HTML. From the repo, the dag-visualize binary does the same thing from a workflow source file:

cargo run --bin dag-visualize -- path/to/workflow.py -o dag.html

In the rendered output, solid lines are state-machine edges and dotted lines are data-flow edges. It's worth running on a workflow you're developing - the difference between "what I think the graph looks like" and "what the conversion actually produced" is often the first hint when behavior surprises you.