Skip to content

AI Services

Intent Kit provides a comprehensive AI services layer that supports multiple LLM providers with unified interfaces, cost tracking, and performance monitoring.

Overview

The AI services layer includes: - Multiple LLM Providers - OpenAI, Anthropic, Google, Ollama, OpenRouter - Unified Interface - Consistent API across all providers - Cost Tracking - Real-time token usage and cost calculation - Performance Monitoring - Response times and metrics - Factory Pattern - Easy provider switching and configuration

Supported Providers

OpenAI

  • Models:
  • GPT-5-2025-08-07 (Latest)
  • GPT-4
  • GPT-4-turbo
  • GPT-4o
  • GPT-4o-mini
  • GPT-3.5-turbo
  • Features: Function calling, streaming, fine-tuning
  • Cost: Pay-per-token pricing

Anthropic

  • Models:
  • Claude Opus 4 (claude-opus-4-20250514)
  • Claude 3.7 Sonnet (claude-3-7-sonnet-20250219)
  • Claude 3.5 Haiku (claude-3-5-haiku-20241022)
  • Features: Constitutional AI, tool use, streaming
  • Cost: Pay-per-token pricing

Google

  • Models:
  • Gemini 2.5 Flash Lite (gemini-2.5-flash-lite)
  • Gemini 2.5 Flash (gemini-2.5-flash)
  • Gemini 2.5 Pro (gemini-2.5-pro)
  • Features: Multimodal, code generation, reasoning
  • Cost: Pay-per-token pricing

Ollama

  • Models: Local models (Llama, Mistral, CodeLlama, etc.)
  • Features: Local deployment, custom models, privacy
  • Cost: Free (local compute)

OpenRouter

  • Models:
  • Google Gemma 2 9B IT (google/gemma-2-9b-it)
  • Meta Llama 3.2 3B Instruct (meta-llama/llama-3.2-3b-instruct)
  • Moonshot Kimi K2 (moonshotai/kimi-k2)
  • Mistral Devstral Small (mistralai/devstral-small)
  • Qwen 3 32B (qwen/qwen3-32b)
  • Z-AI GLM 4.5 (z-ai/glm-4.5)
  • Qwen 3 30B A3B Instruct (qwen/qwen3-30b-a3b-instruct-2507)
  • Mistral 7B Instruct (mistralai/mistral-7b-instruct)
  • Mistral Ministral 8B (mistralai/ministral-8b)
  • Mistral Nemo 20B (mistralai/mistral-nemo-20b)
  • Liquid LFM 40B (liquid/lfm-40b)
  • Plus access to 100+ additional models from various providers
  • Features: Unified API, model comparison, cost optimization
  • Cost: Pay-per-token with provider-specific pricing

Basic Usage

Using the Factory Pattern

from intent_kit.services.ai.llm_factory import LLMFactory
from intent_kit.services.ai.llm_service import LLMService

# Create LLM service with factory
llm_service = LLMService()

# Configure OpenAI
openai_config = {
    "provider": "openai",
    "model": "gpt-4",
    "api_key": "your-openai-key"
}

# Get client
client = llm_service.get_client(openai_config)

# Generate response
response = client.generate("Hello, how are you?")
print(response.content)

Environment Variable Configuration

# OpenAI
export OPENAI_API_KEY="your-key"
export OPENAI_MODEL="gpt-4o"  # or "gpt-5-2025-08-07" for latest

# Anthropic
export ANTHROPIC_API_KEY="your-key"
export ANTHROPIC_MODEL="claude-3-7-sonnet-20250219"  # or "claude-opus-4-20250514" for latest

# Google
export GOOGLE_API_KEY="your-key"
export GOOGLE_MODEL="gemini-2.5-flash-lite"  # or "gemini-2.5-pro" for latest

# Ollama
export OLLAMA_BASE_URL="http://localhost:11434"
export OLLAMA_MODEL="llama2"

# OpenRouter
export OPENROUTER_API_KEY="your-key"
export OPENROUTER_MODEL="mistralai/mistral-7b-instruct"  # or any supported model

Provider-Specific Configuration

OpenAI Configuration

openai_config = {
    "provider": "openai",
    "model": "gpt-4o",  # or "gpt-5-2025-08-07" for latest
    "api_key": "your-key",
    "temperature": 0.7,
    "max_tokens": 1000,
    "top_p": 0.9,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0,
    "stream": False
}

Anthropic Configuration

anthropic_config = {
    "provider": "anthropic",
    "model": "claude-3-7-sonnet-20250219",  # or "claude-opus-4-20250514" for latest
    "api_key": "your-key",
    "max_tokens": 1000,
    "temperature": 0.7,
    "top_p": 0.9,
    "system": "You are a helpful assistant."
}

Google Configuration

google_config = {
    "provider": "google",
    "model": "gemini-2.5-flash-lite",  # or "gemini-2.5-pro" for latest
    "api_key": "your-key",
    "temperature": 0.7,
    "max_output_tokens": 1000,
    "top_p": 0.9,
    "top_k": 40
}

Ollama Configuration

ollama_config = {
    "provider": "ollama",
    "model": "llama2",
    "base_url": "http://localhost:11434",
    "temperature": 0.7,
    "top_p": 0.9,
    "num_predict": 1000
}

OpenRouter Configuration

openrouter_config = {
    "provider": "openrouter",
    "model": "mistralai/mistral-7b-instruct",  # or any supported model
    "api_key": "your-key",
    "temperature": 0.7,
    "max_tokens": 1000,
    "top_p": 0.9
}

Advanced Features

Streaming Responses

# Configure for streaming
config = {
    "provider": "openai",
    "model": "gpt-4",
    "stream": True
}

client = llm_service.get_client(config)

# Stream response
for chunk in client.generate_stream("Tell me a story"):
    print(chunk.content, end="", flush=True)

Function Calling

# Define functions
functions = [
    {
        "name": "get_weather",
        "description": "Get weather information",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
]

# Configure with functions
config = {
    "provider": "openai",
    "model": "gpt-4",
    "functions": functions,
    "function_call": "auto"
}

client = llm_service.get_client(config)
response = client.generate("What's the weather in New York?")

Structured Output

# Configure for structured output
config = {
    "provider": "anthropic",
    "model": "claude-3-sonnet-20240229",
    "response_format": {
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "age": {"type": "integer"},
                "email": {"type": "string"}
            }
        }
    }
}

client = llm_service.get_client(config)
response = client.generate("Extract user information from: John is 25 years old, email: john@example.com")

Cost Tracking

Real-Time Cost Calculation

from intent_kit.services.ai.pricing_service import PricingService

# Initialize pricing service
pricing_service = PricingService()

# Track costs
config = {
    "provider": "openai",
    "model": "gpt-4"
}

client = llm_service.get_client(config)
response = client.generate("Hello world")

# Get cost information
print(f"Input tokens: {response.input_tokens}")
print(f"Output tokens: {response.output_tokens}")
print(f"Total cost: ${response.cost:.4f}")

Cost Optimization

# Compare costs across providers
providers = [
    {"provider": "openai", "model": "gpt-4o"},
    {"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"},
    {"provider": "google", "model": "gemini-2.5-flash-lite"},
    {"provider": "openrouter", "model": "mistralai/mistral-7b-instruct"}
]

for provider_config in providers:
    client = llm_service.get_client(provider_config)
    response = client.generate("Hello world")
    print(f"{provider_config['provider']}: ${response.cost:.4f}")

Performance Monitoring

Response Time Tracking

import time

# Track performance
start_time = time.time()
response = client.generate("Complex query")
end_time = time.time()

print(f"Response time: {end_time - start_time:.2f} seconds")
print(f"Tokens per second: {response.output_tokens / (end_time - start_time):.2f}")

Batch Processing

# Process multiple requests efficiently
queries = [
    "What is AI?",
    "Explain machine learning",
    "Describe neural networks"
]

responses = []
for query in queries:
    response = client.generate(query)
    responses.append(response)

# Aggregate metrics
total_cost = sum(r.cost for r in responses)
total_tokens = sum(r.input_tokens + r.output_tokens for r in responses)
print(f"Total cost: ${total_cost:.4f}")
print(f"Total tokens: {total_tokens}")

Error Handling

Provider-Specific Errors

from intent_kit.services.ai.base_client import LLMError

try:
    response = client.generate("Hello world")
except LLMError as e:
    print(f"LLM Error: {e.message}")
    print(f"Provider: {e.provider}")
    print(f"Model: {e.model}")
except Exception as e:
    print(f"Unexpected error: {e}")

Retry Logic

import time
from functools import wraps

def retry_on_failure(max_retries=3, delay=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except LLMError as e:
                    if attempt == max_retries - 1:
                        raise
                    time.sleep(delay * (2 ** attempt))  # Exponential backoff
            return None
        return wrapper
    return decorator

@retry_on_failure(max_retries=3)
def generate_with_retry(client, prompt):
    return client.generate(prompt)

Best Practices

1. Model Selection

# Choose appropriate model for task
task_models = {
    "conversation": "gpt-4o",
    "code_generation": "claude-3-7-sonnet-20250219",
    "reasoning": "gemini-2.5-pro",
    "local_development": "llama2",
    "cost_optimized": "mistralai/mistral-7b-instruct"
}

2. Cost Management

# Set budget limits
def generate_within_budget(client, prompt, max_cost=0.01):
    response = client.generate(prompt)
    if response.cost > max_cost:
        raise ValueError(f"Cost ${response.cost:.4f} exceeds budget ${max_cost:.4f}")
    return response

3. Caching

import hashlib
import json

class ResponseCache:
    def __init__(self):
        self.cache = {}

    def get_cache_key(self, config, prompt):
        data = json.dumps({"config": config, "prompt": prompt}, sort_keys=True)
        return hashlib.md5(data.encode()).hexdigest()

    def get(self, config, prompt):
        key = self.get_cache_key(config, prompt)
        return self.cache.get(key)

    def set(self, config, prompt, response):
        key = self.get_cache_key(config, prompt)
        self.cache[key] = response

# Use caching
cache = ResponseCache()
cached_response = cache.get(config, prompt)
if cached_response:
    return cached_response

response = client.generate(prompt)
cache.set(config, prompt, response)
return response

4. Environment Management

# Use environment-specific configurations
import os

def get_llm_config():
    env = os.getenv("ENVIRONMENT", "development")

    if env == "production":
        return {
            "provider": "openai",
            "model": "gpt-4o",
            "temperature": 0.1  # More deterministic
        }
    elif env == "development":
        return {
            "provider": "ollama",
            "model": "llama2",
            "temperature": 0.7  # More creative
        }
    else:
        return {
            "provider": "anthropic",
            "model": "claude-3-5-haiku-20241022",
            "temperature": 0.5
        }

Integration with DAGs

Using AI Services in DAGs

from intent_kit import DAGBuilder
from intent_kit.services.ai.llm_service import LLMService

# Initialize LLM service
llm_service = LLMService()

# Create DAG with AI services
builder = DAGBuilder()

# Set default LLM configuration
builder.with_default_llm_config({
    "provider": "openai",
    "model": "gpt-4o",
    "temperature": 0.7
})

# Add nodes that use AI services
builder.add_node("classifier", "classifier",
                 output_labels=["greet", "weather"],
                 description="Classify user intent")

builder.add_node("extractor", "extractor",
                 param_schema={"location": str, "date": str},
                 description="Extract parameters")

# Build and execute
dag = builder.build()
context = DefaultContext()
context.set("llm_service", llm_service)

result = dag.execute("What's the weather in New York tomorrow?", context)

Context-Aware AI Configuration

# Use different models for different tasks
def get_task_specific_config(task_type):
    configs = {
        "classification": {
            "provider": "anthropic",
            "model": "claude-3-5-haiku-20241022",
            "temperature": 0.1
        },
        "extraction": {
            "provider": "openai",
            "model": "gpt-4o",
            "temperature": 0.0
        },
        "conversation": {
            "provider": "google",
            "model": "gemini-2.5-flash-lite",
            "temperature": 0.7
        },
        "cost_optimized": {
            "provider": "openrouter",
            "model": "mistralai/mistral-7b-instruct",
            "temperature": 0.7
        }
    }
    return configs.get(task_type, configs["conversation"])