DAG Conversion
After the AST compiler produces Waymark IR, a separate conversion pass
turns that IR into the graph the runner actually walks. It's a DAG for
most workflows, with a single back-edge for loops. This pass is where
control flow and data flow become explicit - every conditional, loop,
and try/except becomes a concrete arrangement of nodes and edges.
Mental model
Two coordinates per node:
- State machine edges encode "what can run after this completes."
- Data-flow edges encode "which variable values should be written into a node's inbox."
A node becomes runnable when its state-machine predecessors have all completed and its required inbox values are present. The Action Readiness page covers the readiness rules in depth.
Nodes themselves are steps: action calls, assignments, branches, joins, aggregators, and the function boundary nodes that mark where a function begins and ends.
Two-phase conversion
The conversion is two passes:
- Per-function subgraphs. Each function becomes an isolated
subgraph with explicit
inputandoutputboundary nodes. Calls inside the function becomefn_callnodes that capture their kwargs but don't yet point at the callee's graph. - Expansion and global wiring. Starting from the entry function
(
mainif present, otherwise the first non-internal function), the pass inlines helpers, remaps exception edges across function boundaries, and recomputes the global data-flow edges so values defined inside a helper can reach downstream nodes in the caller.
Validation is a final pass - no dangling edges, no invalid loop wiring, no stray output edges - before the DAG is handed to the runtime.
Node types
| Node | What it is |
|---|---|
input / output | Function boundaries. |
action_call | Delegated work dispatched to Python workers. |
assignment / expression / return | Inline work executed in the runner itself. |
branch | Decision point for if / elif / else and loop conditions. |
join | Merge point. Frequently required_count = 1, which collapses to inline. |
parallel | Entry node for a parallel block. |
aggregator | Barrier that waits for spread / parallel results. |
fn_call | Placeholder before expansion (external calls may stay as fn_call). |
Edge types
State machine edges carry execution-order metadata:
guard_exprfor branch and loop conditions.conditionlabels likesuccessorelse.exception_typesfortry/exceptrouting.is_loop_backso back-edges don't count toward readiness.
Data-flow edges are per-variable. (src, dst, var) means "write
var from src into dst's inbox." The conversion avoids stale data
by only wiring from the most recent definition along the execution
order. Join nodes don't define values; they only mark where paths
converge.
How each construct lowers
Straight-line code. Assignments and expressions become inline
nodes. Action calls become action_call nodes. State-machine edges
preserve order; data-flow edges carry only the variables used
downstream.
Function boundaries and returns. Every function has explicit
input and output nodes. All return statements connect to the
output boundary, so early returns terminate the function correctly.
During expansion, helper functions are inlined and their nodes are
prefixed; the helper's input nodes are stripped, with the caller
supplying inputs via data-flow.
Conditionals. An if / elif / else produces a branch node
that fans out to guarded edges plus an else edge for the default
arm. When at least one branch can continue past the conditional, a
join node with required_count = 1 re-merges the paths so the next
step sees a single point of arrival.
Try / except. Try bodies are flattened. Every node inside the try
body can emit exception edges to the handler - those edges carry the
matching exception types as metadata. Success edges flow to a join
node. If the handler binds an exception variable (except E as err:),
the conversion inserts an assignment from __waymark_exception__
before the handler body so err is in scope.
For / while loops. Loops expand into a small state machine:
forloops produceloop_init,loop_cond(branch),loop_extract, the body,loop_incr, and aloop_exitjoin.whileloops produceloop_cond, the body,loop_continueforcontinuewiring, andloop_exit.
Back-edges from the body to loop_cond are marked with is_loop_back
so the readiness check ignores them.
Spread and parallel. A spread becomes a spread action node plus
an aggregator. Each action result is written with a spread_index,
and the aggregator reads them and emits an ordered list. A parallel
block creates a parallel entry node, one node per call, and an
aggregator that waits for all results.
Visualizing the graph
Every workflow has a .visualize() method that renders the graph as
HTML. From the repo, the dag-visualize binary does the same thing
from a workflow source file:
cargo run --bin dag-visualize -- path/to/workflow.py -o dag.html
In the rendered output, solid lines are state-machine edges and dotted lines are data-flow edges. It's worth running on a workflow you're developing - the difference between "what I think the graph looks like" and "what the conversion actually produced" is often the first hint when behavior surprises you.