Skip to content

DAG Validation

Intent Kit provides comprehensive DAG validation to ensure correctness, completeness, and optimal performance of intent classification workflows.

Overview

DAG validation checks for:

  • Structural Integrity - All nodes and edges are properly connected
  • Cyclic Dependencies - No cycles in the graph structure
  • Reachability - All nodes are accessible from entrypoints
  • Label Consistency - Edge labels match node capabilities
  • Configuration Validity - Node configurations are correct

Validation Functions

validate_dag_structure

The primary validation function that performs comprehensive DAG analysis:

from intent_kit.core.validation import validate_dag_structure

# Basic validation
issues = validate_dag_structure(dag)

if issues:
    print("Validation issues found:")
    for issue in issues:
        print(f"  - {issue}")
else:
    print("DAG structure is valid")

Parameters: - dag (IntentDAG): The DAG to validate - producer_labels (Dict[str, Set[str]], optional): Dictionary mapping node_id to set of labels it can produce

Returns: - List[str]: List of validation issues (empty if all valid)

Raises: - CycleError: If a cycle is detected in the DAG - ValueError: If basic structure is invalid

Validation Checks

1. ID Consistency

Ensures all referenced node IDs exist in the DAG:

# Valid DAG
dag = DAGBuilder()
dag.add_node("classifier", "classifier", ...)
dag.add_node("extractor", "extractor", ...)
dag.add_edge("classifier", "extractor", "success")  # Both nodes exist

# Invalid DAG - missing node
dag.add_edge("classifier", "missing_node", "error")  # Will fail validation

Common Issues: - Edge source node doesn't exist - Edge destination node doesn't exist - Entrypoint node doesn't exist

2. Entrypoint Validation

Ensures DAG has valid entrypoints:

# Valid DAG with entrypoints
dag = DAGBuilder()
dag.add_node("classifier", "classifier", ...)
dag.set_entrypoints(["classifier"])

# Invalid DAG - no entrypoints
dag = DAGBuilder()
dag.add_node("classifier", "classifier", ...)
# Missing set_entrypoints() call - will fail validation

Requirements: - At least one entrypoint must be defined - All entrypoints must exist in the DAG - Entrypoints must be reachable

3. Cycle Detection

Detects cycles in the DAG structure using Kahn's algorithm:

# Valid DAG - no cycles
dag = DAGBuilder()
dag.add_node("A", "classifier", ...)
dag.add_node("B", "extractor", ...)
dag.add_node("C", "action", ...)
dag.add_edge("A", "B")
dag.add_edge("B", "C")

# Invalid DAG - cycle detected
dag.add_edge("C", "A")  # Creates cycle A -> B -> C -> A

Cycle Detection Algorithm: 1. Calculate in-degrees for all nodes 2. Add nodes with zero in-degree to queue 3. Process queue, reducing in-degrees of neighbors 4. If all nodes are processed, no cycles exist 5. If nodes remain unprocessed, cycles exist

4. Reachability Analysis

Ensures all nodes are reachable from entrypoints:

# Valid DAG - all nodes reachable
dag = DAGBuilder()
dag.add_node("classifier", "classifier", ...)
dag.add_node("extractor", "extractor", ...)
dag.add_node("action", "action", ...)
dag.add_edge("classifier", "extractor")
dag.add_edge("extractor", "action")
dag.set_entrypoints(["classifier"])

# Invalid DAG - unreachable node
dag.add_node("orphan", "action", ...)  # No edges to/from this node

Reachability Algorithm: 1. Start from entrypoints 2. Traverse edges using BFS 3. Mark visited nodes 4. Report unvisited nodes as unreachable

5. Label Validation

Validates edge labels against node capabilities (when producer_labels provided):

from intent_kit.core.validation import validate_dag_structure

# Define node capabilities
producer_labels = {
    "classifier": {"greet", "weather", "booking"},
    "extractor": {"success", "error"},
    "action": {"success", "error"}
}

# Validate with label constraints
issues = validate_dag_structure(dag, producer_labels)

# Valid edges
dag.add_edge("classifier", "extractor", "greet")  # "greet" in classifier labels
dag.add_edge("extractor", "action", "success")    # "success" in extractor labels

# Invalid edge
dag.add_edge("classifier", "extractor", "invalid_label")  # Not in classifier labels

Error Handling

Common Exceptions

CycleError

Raised when a cycle is detected in the DAG:

from intent_kit.core.exceptions import CycleError

try:
    issues = validate_dag_structure(dag)
except CycleError as e:
    print(f"Cycle detected: {e}")
    # Handle cycle - typically requires DAG redesign

ValueError

Raised for basic structural issues:

try:
    issues = validate_dag_structure(dag)
except ValueError as e:
    print(f"Structural error: {e}")
    # Handle structural issues

Validation Issue Types

# Common validation issues
issues = [
    "Unreachable nodes: orphan_node",
    "Missing node: referenced_node",
    "Invalid entrypoint: non_existent_entrypoint",
    "Cycle detected: A -> B -> C -> A",
    "Invalid edge label: 'invalid_label' not in producer labels"
]

Advanced Validation

Custom Validation Rules

Extend validation with custom rules:

def custom_validation(dag):
    issues = []

    # Check for required node types
    has_classifier = any(node.type == "classifier" for node in dag.nodes.values())
    if not has_classifier:
        issues.append("DAG must contain at least one classifier node")

    # Check for proper error handling
    has_clarification = any(node.type == "clarification" for node in dag.nodes.values())
    if not has_clarification:
        issues.append("Consider adding clarification nodes for error handling")

    return issues

# Combine with built-in validation
builtin_issues = validate_dag_structure(dag)
custom_issues = custom_validation(dag)
all_issues = builtin_issues + custom_issues

Performance Validation

Validate DAG performance characteristics:

def performance_validation(dag):
    issues = []

    # Check for excessive fanout
    for node_id, node in dag.nodes.items():
        outgoing_edges = len(dag.adj.get(node_id, {}))
        if outgoing_edges > 10:
            issues.append(f"Node {node_id} has high fanout ({outgoing_edges} edges)")

    # Check for deep chains
    max_depth = calculate_max_depth(dag)
    if max_depth > 20:
        issues.append(f"DAG has deep execution chain ({max_depth} levels)")

    return issues

def calculate_max_depth(dag):
    """Calculate maximum depth from entrypoints to any node."""
    depths = {}

    def dfs(node_id, depth):
        if node_id in depths:
            depths[node_id] = max(depths[node_id], depth)
        else:
            depths[node_id] = depth

        for labels in dag.adj.get(node_id, {}).values():
            for next_node in labels:
                dfs(next_node, depth + 1)

    for entrypoint in dag.entrypoints:
        dfs(entrypoint, 0)

    return max(depths.values()) if depths else 0

Integration with DAG Building

Automatic Validation

DAG validation is automatically performed during build:

from intent_kit import DAGBuilder

builder = DAGBuilder()

# Add nodes and edges
builder.add_node("classifier", "classifier", ...)
builder.add_node("extractor", "extractor", ...)
builder.add_edge("classifier", "extractor", "success")

# Build with validation (default)
dag = builder.build()  # Validates automatically

# Build without validation (not recommended)
dag = builder.build(validate_structure=False)

Validation During Development

Use validation during DAG development:

# Validate after each major change
builder = DAGBuilder()
builder.add_node("classifier", "classifier", ...)

# Check intermediate state
try:
    dag = builder.build()
    print("DAG is valid so far")
except Exception as e:
    print(f"Validation failed: {e}")

# Continue building
builder.add_node("extractor", "extractor", ...)
builder.add_edge("classifier", "extractor", "success")

# Final validation
dag = builder.build()
print("DAG is complete and valid")

Best Practices

1. Start with Simple DAGs

# Start simple and add complexity
builder = DAGBuilder()

# Phase 1: Basic classifier -> action
builder.add_node("classifier", "classifier", ...)
builder.add_node("action", "action", ...)
builder.add_edge("classifier", "action", "success")
dag = builder.build()  # Validate

# Phase 2: Add extractor
builder.add_node("extractor", "extractor", ...)
builder.add_edge("classifier", "extractor", "success")
builder.add_edge("extractor", "action", "success")
dag = builder.build()  # Validate

# Phase 3: Add error handling
builder.add_node("clarification", "clarification", ...)
builder.add_edge("classifier", "clarification", "error")
builder.add_edge("extractor", "clarification", "error")
dag = builder.build()  # Validate

2. Use Descriptive Node Names

# Good - descriptive names
builder.add_node("intent_classifier", "classifier", ...)
builder.add_node("extract_user_name", "extractor", ...)
builder.add_node("send_greeting", "action", ...)

# Avoid - unclear names
builder.add_node("node1", "classifier", ...)
builder.add_node("node2", "extractor", ...)
builder.add_node("node3", "action", ...)

3. Validate Early and Often

# Validate at each step
builder = DAGBuilder()

# Step 1: Add classifier
builder.add_node("classifier", "classifier", ...)
try:
    dag = builder.build()
    print("✓ Classifier added successfully")
except Exception as e:
    print(f"✗ Classifier validation failed: {e}")
    return

# Step 2: Add extractor
builder.add_node("extractor", "extractor", ...)
builder.add_edge("classifier", "extractor", "success")
try:
    dag = builder.build()
    print("✓ Extractor added successfully")
except Exception as e:
    print(f"✗ Extractor validation failed: {e}")
    return

4. Handle Validation Errors Gracefully

def build_dag_with_validation():
    builder = DAGBuilder()

    try:
        # Build DAG
        builder.add_node("classifier", "classifier", ...)
        builder.add_node("extractor", "extractor", ...)
        builder.add_edge("classifier", "extractor", "success")

        # Validate and build
        dag = builder.build()
        print("DAG built successfully")
        return dag

    except CycleError as e:
        print(f"Cycle detected: {e}")
        print("Please review your DAG structure")
        return None

    except ValueError as e:
        print(f"Structural error: {e}")
        print("Please check node IDs and connections")
        return None

    except Exception as e:
        print(f"Unexpected error: {e}")
        return None

5. Use Label Constraints

# Define clear label constraints
producer_labels = {
    "intent_classifier": {"greet", "weather", "booking", "help"},
    "user_extractor": {"success", "missing_name", "invalid_format"},
    "weather_action": {"success", "api_error", "location_not_found"},
    "greeting_action": {"success", "error"}
}

# Validate with constraints
issues = validate_dag_structure(dag, producer_labels)
if issues:
    print("Label validation issues:")
    for issue in issues:
        print(f"  - {issue}")

Performance Considerations

1. Validation Overhead

# Validation adds overhead - use judiciously
import time

# Time validation
start = time.time()
issues = validate_dag_structure(dag)
validation_time = time.time() - start

print(f"Validation took {validation_time:.3f} seconds")

# For large DAGs, consider validation levels
if dag_size < 100:
    # Full validation for small DAGs
    issues = validate_dag_structure(dag)
else:
    # Basic validation for large DAGs
    issues = validate_dag_structure(dag, producer_labels=None)

2. Caching Validation Results

# Cache validation results for unchanged DAGs
class CachedDAG:
    def __init__(self, dag):
        self.dag = dag
        self._validation_cache = None
        self._dag_hash = None

    def validate(self):
        current_hash = hash(str(self.dag.nodes) + str(self.dag.adj))

        if self._dag_hash == current_hash and self._validation_cache is not None:
            return self._validation_cache

        issues = validate_dag_structure(self.dag)
        self._validation_cache = issues
        self._dag_hash = current_hash
        return issues

DAG validation ensures your intent classification workflows are robust, efficient, and maintainable.