LLM Integration¶

Intent Kit supports multiple Large Language Model (LLM) providers, allowing you to choose the best AI service for your needs. This guide covers configuration, best practices, and provider-specific features.

Supported Providers¶

OpenAI¶

OpenAI provides access to GPT models including GPT-4o and GPT-5-2025-08-07.

Configuration¶

llm_config = {
    "provider": "openai",
    "model": "gpt-5-2025-08-07",  # or "gpt-4o", "gpt-4o-mini"
    "api_key": "your-openai-api-key",
    "temperature": 0.1,
    "max_tokens": 1000
}

Environment Variable¶

export OPENAI_API_KEY="your-openai-api-key"

Features¶

Fast response times - Optimized for real-time applications
Cost-effective - Competitive pricing for most use cases
Reliable - High availability and uptime
Function calling - Native support for structured outputs

Best Practices¶

Use gpt-4o for classification and extraction tasks
Use gpt-5-2025-08-07 for complex reasoning tasks
Set temperature to 0.1-0.3 for consistent results
Monitor token usage to control costs

Anthropic¶

Anthropic provides access to Claude models with strong reasoning capabilities.

Configuration¶

llm_config = {
    "provider": "anthropic",
    "model": "claude-3-7-sonnet-20250219",  # or "claude-3-5-haiku-20241022", "claude-opus-4-20250514"
    "api_key": "your-anthropic-api-key",
    "temperature": 0.1,
    "max_tokens": 1000
}

Environment Variable¶

export ANTHROPIC_API_KEY="your-anthropic-api-key"

Features¶

Strong reasoning - Excellent for complex decision-making
Safety-focused - Built with safety and alignment in mind
Long context - Support for large conversation histories
Structured outputs - Native support for JSON and other formats

Best Practices¶

Use claude-3-7-sonnet-20250219 for most tasks (good balance of speed and capability)
Use claude-opus-4-20250514 for complex reasoning tasks
Use claude-3-5-haiku-20241022 for simple, fast tasks
Leverage long context for multi-turn conversations

Google¶

Google provides access to Gemini models with strong multimodal capabilities.

Configuration¶

llm_config = {
    "provider": "google",
    "model": "gemini-2.5-flash-lite",  # or "gemini-2.5-flash", "gemini-2.5-pro"
    "api_key": "your-google-api-key",
    "temperature": 0.1,
    "max_tokens": 1000
}

Environment Variable¶

export GOOGLE_API_KEY="your-google-api-key"

Features¶

Multimodal - Support for text, images, and other media
Cost-effective - Competitive pricing
Fast inference - Optimized for real-time applications
Google ecosystem - Integration with Google Cloud services

Best Practices¶

Use gemini-2.5-flash-lite for text-based tasks
Use gemini-2.5-pro for complex reasoning tasks
Leverage Google Cloud integration for enterprise features
Monitor usage through Google Cloud Console

Ollama¶

Ollama allows you to run open-source models locally on your machine.

Configuration¶

llm_config = {
    "provider": "ollama",
    "model": "llama2",  # or "mistral", "codellama", "llama2:13b"
    "base_url": "http://localhost:11434",  # Default Ollama URL
    "temperature": 0.1
}

Installation¶

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama2

Features¶

Local deployment - No API keys or external dependencies
Privacy - Data stays on your machine
Customizable - Fine-tune models for your specific needs
Cost-effective - No per-token charges

Best Practices¶

Use appropriate model sizes for your hardware
Consider using quantized models for better performance
Monitor memory usage with large models
Use GPU acceleration when available

OpenRouter¶

OpenRouter provides access to multiple AI providers through a unified API.

Configuration¶

llm_config = {
    "provider": "openrouter",
    "model": "google/gemma-2-9b-it",  # or "mistralai/mistral-7b-instruct"
    "api_key": "your-openrouter-api-key",
    "base_url": "https://openrouter.ai/api/v1",
    "temperature": 0.1
}

Environment Variable¶

export OPENROUTER_API_KEY="your-openrouter-api-key"

Features¶

Provider agnostic - Access multiple AI providers
Cost comparison - Compare pricing across providers
Unified API - Single interface for multiple providers
Model marketplace - Access to many different models

Configuration Options¶

Common Parameters¶

All providers support these common configuration options:

llm_config = {
    "provider": "openai",  # Required: Provider name
    "model": "google/gemma-2-9b-it",  # Required: Model name
    "api_key": "your-api-key",  # Required: API key
    "temperature": 0.1,  # Optional: Sampling temperature (0.0-2.0)
    "max_tokens": 1000,  # Optional: Maximum tokens to generate
    "timeout": 30,  # Optional: Request timeout in seconds
    "retries": 3,  # Optional: Number of retry attempts
}

Provider-Specific Options¶

OpenAI¶

llm_config = {
    "provider": "openai",
    "model": "gpt-4o",
    "api_key": "your-api-key",
    "temperature": 0.1,
    "max_tokens": 1000,
    "top_p": 1.0,  # Nucleus sampling
    "frequency_penalty": 0.0,  # Frequency penalty
    "presence_penalty": 0.0,  # Presence penalty
}

Anthropic¶

llm_config = {
    "provider": "anthropic",
    "model": "claude-3-7-sonnet-20250219",
    "api_key": "your-api-key",
    "temperature": 0.1,
    "max_tokens": 1000,
    "top_p": 1.0,  # Top-p sampling
    "top_k": 40,  # Top-k sampling
}

Google¶

llm_config = {
    "provider": "google",
    "model": "gemini-2.5-flash-lite",
    "api_key": "your-api-key",
    "temperature": 0.1,
    "max_tokens": 1000,
    "top_p": 1.0,  # Top-p sampling
    "top_k": 40,  # Top-k sampling
}

Usage Examples¶

Basic Configuration¶

from intent_kit import DAGBuilder

# Create builder with default LLM config
builder = DAGBuilder()
builder.with_default_llm_config({
    "provider": "openrouter",
    "model": "google/gemma-2-9b-it",
    "api_key": "your-api-key",
    "temperature": 0.1
})

# Add nodes (they'll use the default config)
builder.add_node("classifier", "classifier",
                 output_labels=["greet", "calculate"],
                 description="Main intent classifier")

Per-Node Configuration¶

# Override LLM config for specific nodes
builder.add_node("classifier", "classifier",
                 output_labels=["greet", "calculate"],
                 description="Main intent classifier",
                 llm_config={
                     "provider": "openrouter",
                     "model": "google/gemma-2-9b-it",
                     "api_key": "your-openrouter-api-key",
                     "temperature": 0.1
                 })

JSON Configuration¶

dag_config = {
    "default_llm_config": {
        "provider": "openrouter",
        "model": "google/gemma-2-9b-it",
        "api_key": "your-api-key",
        "temperature": 0.1
    },
    "nodes": {
        "classifier": {
            "type": "classifier",
            "output_labels": ["greet", "calculate"],
            "description": "Main intent classifier"
        },
        "extractor": {
            "type": "extractor",
            "param_schema": {"name": str},
            "description": "Extract name from greeting",
            "llm_config": {
                "provider": "openrouter",
                "model": "google/gemma-2-9b-it",
                "api_key": "your-openrouter-api-key"
            }
        }
    }
}

Best Practices¶

Model Selection¶

Classification Tasks: Use faster, cheaper models (gpt-4o, claude-3-5-haiku-20241022)
Extraction Tasks: Use models with good instruction following (gpt-4o, claude-3-7-sonnet-20250219)
Complex Reasoning: Use more capable models (gpt-5-2025-08-07, claude-opus-4-20250514)
Privacy-Sensitive: Use local models (Ollama)

Temperature Settings¶

0.0-0.2: Consistent, deterministic outputs (recommended for classification)
0.2-0.5: Balanced creativity and consistency
0.5-1.0: More creative, varied outputs
1.0+: Highly creative, less predictable

Cost Optimization¶

Use appropriate models - Don't use gpt-5-2025-08-07 for simple tasks
Set reasonable limits - Use max_tokens to control costs
Cache results - Implement caching for repeated requests
Monitor usage - Track token consumption and costs
Use local models - Consider Ollama for development and testing

Error Handling¶

from intent_kit.core.exceptions import LLMError

try:
    result, context = run_dag(dag, "Hello Alice")
except LLMError as e:
    print(f"LLM error: {e}")
    # Handle rate limits, API errors, etc.

Rate Limiting¶

llm_config = {
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "api_key": "your-api-key",
    "retries": 3,
    "retry_delay": 1.0,  # Seconds between retries
    "timeout": 30
}

Troubleshooting¶

Common Issues¶

API Key Errors¶

# Check environment variables
import os
print(os.getenv("OPENAI_API_KEY"))  # Should not be None

Rate Limiting¶

# Implement exponential backoff
llm_config = {
    "retries": 5,
    "retry_delay": 2.0,
    "backoff_factor": 2.0
}

Model Not Found¶

# Check model names
# OpenAI: "gpt-4o", "gpt-5-2025-08-07"
# Anthropic: "claude-3-7-sonnet-20250219", "claude-3-5-haiku-20241022"
# Google: "gemini-2.5-flash-lite", "gemini-2.5-pro"
# Ollama: "llama2", "mistral", "codellama"

Timeout Issues¶

# Increase timeout for complex tasks
llm_config = {
    "timeout": 60,  # 60 seconds
    "max_tokens": 2000
}

Migration Guide¶

Switching Providers¶

To switch from one provider to another:

Update configuration:

# From OpenAI
llm_config = {"provider": "openrouter", "model": "google/gemma-2-9b-it"}

# To Anthropic
llm_config = {"provider": "openrouter", "model": "google/gemma-2-9b-it"}

Update environment variables:

# Remove old
unset OPENAI_API_KEY

# Set new
export ANTHROPIC_API_KEY="your-anthropic-api-key"

Test thoroughly - Different providers may have slightly different outputs

Model Upgrades¶

When upgrading to newer models:

Test compatibility - Ensure your prompts work with the new model
Adjust parameters - New models may need different temperature settings
Monitor performance - Track accuracy and response times
Update costs - Newer models may have different pricing

Security Considerations¶

API Key Management¶

Use environment variables - Never hardcode API keys
Rotate keys regularly - Change API keys periodically
Use least privilege - Only grant necessary permissions
Monitor usage - Track API key usage for anomalies

Data Privacy¶

Review data handling - Understand what data is sent to providers
Use local models - Consider Ollama for sensitive data
Implement data retention - Clear sensitive data after processing
Audit logs - Keep logs of all LLM interactions

Performance Monitoring¶

Metrics to Track¶

Response time - Time to get LLM response
Token usage - Number of tokens consumed
Cost per request - Monetary cost of each request
Success rate - Percentage of successful requests
Error rate - Percentage of failed requests

Monitoring Setup¶

# Enable detailed logging
import logging
logging.basicConfig(level=logging.DEBUG)

# Track metrics
from intent_kit.utils.perf_util import track_execution

@track_execution
def my_function():
    # Your code here
    pass