LLM Integration¶
Intent Kit supports multiple Large Language Model (LLM) providers, allowing you to choose the best AI service for your needs. This guide covers configuration, best practices, and provider-specific features.
Supported Providers¶
OpenAI¶
OpenAI provides access to GPT models including GPT-4o and GPT-5-2025-08-07.
Configuration¶
llm_config = {
"provider": "openai",
"model": "gpt-5-2025-08-07", # or "gpt-4o", "gpt-4o-mini"
"api_key": "your-openai-api-key",
"temperature": 0.1,
"max_tokens": 1000
}
Environment Variable¶
Features¶
- Fast response times - Optimized for real-time applications
- Cost-effective - Competitive pricing for most use cases
- Reliable - High availability and uptime
- Function calling - Native support for structured outputs
Best Practices¶
- Use
gpt-4ofor classification and extraction tasks - Use
gpt-5-2025-08-07for complex reasoning tasks - Set
temperatureto 0.1-0.3 for consistent results - Monitor token usage to control costs
Anthropic¶
Anthropic provides access to Claude models with strong reasoning capabilities.
Configuration¶
llm_config = {
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219", # or "claude-3-5-haiku-20241022", "claude-opus-4-20250514"
"api_key": "your-anthropic-api-key",
"temperature": 0.1,
"max_tokens": 1000
}
Environment Variable¶
Features¶
- Strong reasoning - Excellent for complex decision-making
- Safety-focused - Built with safety and alignment in mind
- Long context - Support for large conversation histories
- Structured outputs - Native support for JSON and other formats
Best Practices¶
- Use
claude-3-7-sonnet-20250219for most tasks (good balance of speed and capability) - Use
claude-opus-4-20250514for complex reasoning tasks - Use
claude-3-5-haiku-20241022for simple, fast tasks - Leverage long context for multi-turn conversations
Google¶
Google provides access to Gemini models with strong multimodal capabilities.
Configuration¶
llm_config = {
"provider": "google",
"model": "gemini-2.5-flash-lite", # or "gemini-2.5-flash", "gemini-2.5-pro"
"api_key": "your-google-api-key",
"temperature": 0.1,
"max_tokens": 1000
}
Environment Variable¶
Features¶
- Multimodal - Support for text, images, and other media
- Cost-effective - Competitive pricing
- Fast inference - Optimized for real-time applications
- Google ecosystem - Integration with Google Cloud services
Best Practices¶
- Use
gemini-2.5-flash-litefor text-based tasks - Use
gemini-2.5-profor complex reasoning tasks - Leverage Google Cloud integration for enterprise features
- Monitor usage through Google Cloud Console
Ollama¶
Ollama allows you to run open-source models locally on your machine.
Configuration¶
llm_config = {
"provider": "ollama",
"model": "llama2", # or "mistral", "codellama", "llama2:13b"
"base_url": "http://localhost:11434", # Default Ollama URL
"temperature": 0.1
}
Installation¶
Features¶
- Local deployment - No API keys or external dependencies
- Privacy - Data stays on your machine
- Customizable - Fine-tune models for your specific needs
- Cost-effective - No per-token charges
Best Practices¶
- Use appropriate model sizes for your hardware
- Consider using quantized models for better performance
- Monitor memory usage with large models
- Use GPU acceleration when available
OpenRouter¶
OpenRouter provides access to multiple AI providers through a unified API.
Configuration¶
llm_config = {
"provider": "openrouter",
"model": "google/gemma-2-9b-it", # or "mistralai/mistral-7b-instruct"
"api_key": "your-openrouter-api-key",
"base_url": "https://openrouter.ai/api/v1",
"temperature": 0.1
}
Environment Variable¶
Features¶
- Provider agnostic - Access multiple AI providers
- Cost comparison - Compare pricing across providers
- Unified API - Single interface for multiple providers
- Model marketplace - Access to many different models
Configuration Options¶
Common Parameters¶
All providers support these common configuration options:
llm_config = {
"provider": "openai", # Required: Provider name
"model": "google/gemma-2-9b-it", # Required: Model name
"api_key": "your-api-key", # Required: API key
"temperature": 0.1, # Optional: Sampling temperature (0.0-2.0)
"max_tokens": 1000, # Optional: Maximum tokens to generate
"timeout": 30, # Optional: Request timeout in seconds
"retries": 3, # Optional: Number of retry attempts
}
Provider-Specific Options¶
OpenAI¶
llm_config = {
"provider": "openai",
"model": "gpt-4o",
"api_key": "your-api-key",
"temperature": 0.1,
"max_tokens": 1000,
"top_p": 1.0, # Nucleus sampling
"frequency_penalty": 0.0, # Frequency penalty
"presence_penalty": 0.0, # Presence penalty
}
Anthropic¶
llm_config = {
"provider": "anthropic",
"model": "claude-3-7-sonnet-20250219",
"api_key": "your-api-key",
"temperature": 0.1,
"max_tokens": 1000,
"top_p": 1.0, # Top-p sampling
"top_k": 40, # Top-k sampling
}
Google¶
llm_config = {
"provider": "google",
"model": "gemini-2.5-flash-lite",
"api_key": "your-api-key",
"temperature": 0.1,
"max_tokens": 1000,
"top_p": 1.0, # Top-p sampling
"top_k": 40, # Top-k sampling
}
Usage Examples¶
Basic Configuration¶
from intent_kit import DAGBuilder
# Create builder with default LLM config
builder = DAGBuilder()
builder.with_default_llm_config({
"provider": "openrouter",
"model": "google/gemma-2-9b-it",
"api_key": "your-api-key",
"temperature": 0.1
})
# Add nodes (they'll use the default config)
builder.add_node("classifier", "classifier",
output_labels=["greet", "calculate"],
description="Main intent classifier")
Per-Node Configuration¶
# Override LLM config for specific nodes
builder.add_node("classifier", "classifier",
output_labels=["greet", "calculate"],
description="Main intent classifier",
llm_config={
"provider": "openrouter",
"model": "google/gemma-2-9b-it",
"api_key": "your-openrouter-api-key",
"temperature": 0.1
})
JSON Configuration¶
dag_config = {
"default_llm_config": {
"provider": "openrouter",
"model": "google/gemma-2-9b-it",
"api_key": "your-api-key",
"temperature": 0.1
},
"nodes": {
"classifier": {
"type": "classifier",
"output_labels": ["greet", "calculate"],
"description": "Main intent classifier"
},
"extractor": {
"type": "extractor",
"param_schema": {"name": str},
"description": "Extract name from greeting",
"llm_config": {
"provider": "openrouter",
"model": "google/gemma-2-9b-it",
"api_key": "your-openrouter-api-key"
}
}
}
}
Best Practices¶
Model Selection¶
- Classification Tasks: Use faster, cheaper models (gpt-4o, claude-3-5-haiku-20241022)
- Extraction Tasks: Use models with good instruction following (gpt-4o, claude-3-7-sonnet-20250219)
- Complex Reasoning: Use more capable models (gpt-5-2025-08-07, claude-opus-4-20250514)
- Privacy-Sensitive: Use local models (Ollama)
Temperature Settings¶
- 0.0-0.2: Consistent, deterministic outputs (recommended for classification)
- 0.2-0.5: Balanced creativity and consistency
- 0.5-1.0: More creative, varied outputs
- 1.0+: Highly creative, less predictable
Cost Optimization¶
- Use appropriate models - Don't use gpt-5-2025-08-07 for simple tasks
- Set reasonable limits - Use
max_tokensto control costs - Cache results - Implement caching for repeated requests
- Monitor usage - Track token consumption and costs
- Use local models - Consider Ollama for development and testing
Error Handling¶
from intent_kit.core.exceptions import LLMError
try:
result, context = run_dag(dag, "Hello Alice")
except LLMError as e:
print(f"LLM error: {e}")
# Handle rate limits, API errors, etc.
Rate Limiting¶
llm_config = {
"provider": "openai",
"model": "gpt-3.5-turbo",
"api_key": "your-api-key",
"retries": 3,
"retry_delay": 1.0, # Seconds between retries
"timeout": 30
}
Troubleshooting¶
Common Issues¶
API Key Errors¶
Rate Limiting¶
# Implement exponential backoff
llm_config = {
"retries": 5,
"retry_delay": 2.0,
"backoff_factor": 2.0
}
Model Not Found¶
# Check model names
# OpenAI: "gpt-4o", "gpt-5-2025-08-07"
# Anthropic: "claude-3-7-sonnet-20250219", "claude-3-5-haiku-20241022"
# Google: "gemini-2.5-flash-lite", "gemini-2.5-pro"
# Ollama: "llama2", "mistral", "codellama"
Timeout Issues¶
# Increase timeout for complex tasks
llm_config = {
"timeout": 60, # 60 seconds
"max_tokens": 2000
}
Migration Guide¶
Switching Providers¶
To switch from one provider to another:
-
Update configuration:
-
Update environment variables:
-
Test thoroughly - Different providers may have slightly different outputs
Model Upgrades¶
When upgrading to newer models:
- Test compatibility - Ensure your prompts work with the new model
- Adjust parameters - New models may need different temperature settings
- Monitor performance - Track accuracy and response times
- Update costs - Newer models may have different pricing
Security Considerations¶
API Key Management¶
- Use environment variables - Never hardcode API keys
- Rotate keys regularly - Change API keys periodically
- Use least privilege - Only grant necessary permissions
- Monitor usage - Track API key usage for anomalies
Data Privacy¶
- Review data handling - Understand what data is sent to providers
- Use local models - Consider Ollama for sensitive data
- Implement data retention - Clear sensitive data after processing
- Audit logs - Keep logs of all LLM interactions
Performance Monitoring¶
Metrics to Track¶
- Response time - Time to get LLM response
- Token usage - Number of tokens consumed
- Cost per request - Monetary cost of each request
- Success rate - Percentage of successful requests
- Error rate - Percentage of failed requests