Die einfache Integration einer LLM-API in Django ist eigentlich trivial — ein HTTP-Request, ein JSON-Response, fertig. Aber produktiver Code ist mehr: Error Handling, Streaming, Rate Limiting, Logging, User-Context. Dieser Artikel zeigt euch, wie ihr LLM-APIs (OpenAI ChatGPT, Anthropic Claude) in ein Django-Projekt integriert und dabei Best Practices von Anfang an einbaut. Wir bauen ein echtes Chat-System Step-by-Step auf, mit vollständigem Error Handling, Tests und vergleichen zwei der populärsten APIs.
Setup: Dependencies und Konfiguration
Zuerst die Dependencies. Wir nutzen die offiziellen SDKs:
```bash pip install openai anthropic django-environ python-dotenv ```
Eure `settings.py` sollte API-Keys sicher laden:
```python # settings.py import os from pathlib import Path from dotenv import load_dotenv
load_dotenv()
BASE_DIR = Path(__file__).resolve().parent.parent
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') ANTHROPIC_API_KEY = os.getenv('ANTHROPIC_API_KEY')
# Timeout settings for LLM API calls LLM_API_TIMEOUT = int(os.getenv('LLM_API_TIMEOUT', '30')) LLM_MAX_TOKENS = int(os.getenv('LLM_MAX_TOKENS', '1024')) LLM_TEMPERATURE = float(os.getenv('LLM_TEMPERATURE', '0.7'))
# Model selection LLM_PROVIDER = os.getenv('LLM_PROVIDER', 'anthropic') # 'openai' or 'anthropic' LLM_MODEL = os.getenv('LLM_MODEL', 'claude-3-5-sonnet-20241022') ```
Und eure `.env`:
``` OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... LLM_PROVIDER=anthropic LLM_MODEL=claude-3-5-sonnet-20241022 LLM_TIMEOUT=30 LLM_MAX_TOKENS=1024 ```
Wichtig: `.env` ins `.gitignore`!
Models: Chat-Historie speichern
Ein minimales Model für Chat-Nachrichten:
```python # models.py from django.db import models from django.contrib.auth.models import User from django.utils import timezone
class ChatSession(models.Model): user = models.ForeignKey(User, on_delete=models.CASCADE) title = models.CharField(max_length=255, blank=True) created_at = models.DateTimeField(auto_now_add=True) updated_at = models.DateTimeField(auto_now=True) class Meta: ordering = ['-updated_at'] def __str__(self): return self.title or f"Chat {self.created_at.strftime('%Y-%m-%d %H:%M')}"
class ChatMessage(models.Model): ROLE_CHOICES = [ ('user', 'User'), ('assistant', 'Assistant'), ] session = models.ForeignKey(ChatSession, on_delete=models.CASCADE, related_name='messages') role = models.CharField(max_length=10, choices=ROLE_CHOICES) content = models.TextField() created_at = models.DateTimeField(auto_now_add=True) class Meta: ordering = ['created_at'] def __str__(self): return f"{self.role}: {self.content[:50]}..." ```
Service Layer: Abstraktion der LLM-Provider
Wichtig: Niemals API-Calls direkt in Views machen. Das ist die erste Lektion. Wir bauen eine Service Layer, die die Provider abstrahiert:
```python # services/llm_service.py from abc import ABC, abstractmethod from typing import Optional, Iterator import logging
logger = logging.getLogger(__name__)
class LLMProvider(ABC): """Abstract base class for LLM providers.""" @abstractmethod def generate_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> str: """Generate a single response.""" pass @abstractmethod def stream_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> Iterator[str]: """Stream a response token by token.""" pass ```
Nun die Anthropic-Implementation:
```python # services/llm_service.py - continued from anthropic import Anthropic import os
class AnthropicProvider(LLMProvider): def __init__(self, api_key: Optional[str] = None): self.api_key = api_key or os.getenv('ANTHROPIC_API_KEY') self.client = Anthropic(api_key=self.api_key) def generate_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> str: try: response = self.client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=max_tokens, temperature=temperature, messages=messages, ) return response.content[0].text except Exception as e: logger.error(f"Anthropic API error: {str(e)}") raise def stream_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> Iterator[str]: try: with self.client.messages.stream( model="claude-3-5-sonnet-20241022", max_tokens=max_tokens, temperature=temperature, messages=messages, ) as stream: for text in stream.text_stream: yield text except Exception as e: logger.error(f"Anthropic streaming error: {str(e)}") raise ```
Und die OpenAI-Implementation:
```python # services/llm_service.py - continued from openai import OpenAI
class OpenAIProvider(LLMProvider): def __init__(self, api_key: Optional[str] = None): self.api_key = api_key or os.getenv('OPENAI_API_KEY') self.client = OpenAI(api_key=self.api_key) def generate_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> str: try: response = self.client.chat.completions.create( model="gpt-4-turbo", max_tokens=max_tokens, temperature=temperature, messages=messages, ) return response.choices[0].message.content except Exception as e: logger.error(f"OpenAI API error: {str(e)}") raise def stream_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> Iterator[str]: try: stream = self.client.chat.completions.create( model="gpt-4-turbo", max_tokens=max_tokens, temperature=temperature, messages=messages, stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: yield chunk.choices[0].delta.content except Exception as e: logger.error(f"OpenAI streaming error: {str(e)}") raise ```
Und ein Factory, um den Provider zu wählen:
```python # services/llm_service.py - continued from django.conf import settings
class LLMFactory: """Factory for creating LLM provider instances.""" _providers = { 'anthropic': AnthropicProvider, 'openai': OpenAIProvider, } @classmethod def create_provider(cls, provider_name: Optional[str] = None) -> LLMProvider: """Create an LLM provider instance.""" provider_name = provider_name or settings.LLM_PROVIDER if provider_name not in cls._providers: raise ValueError(f"Unknown provider: {provider_name}") return cls._providers[provider_name]() ```
Views: Synchrone und asynchrone Chat-Endpoints
Nun bauen wir die Views. Zuerst ein einfacher synchroner Endpoint:
```python # views.py from django.http import JsonResponse from django.views.decorators.http import require_http_methods from django.views.decorators.csrf import csrf_exempt from django.contrib.auth.decorators import login_required from rest_framework.views import APIView from rest_framework.response import Response from rest_framework.permissions import IsAuthenticated from rest_framework import status import json
from .models import ChatSession, ChatMessage from .services.llm_service import LLMFactory import logging
logger = logging.getLogger(__name__)
class ChatMessageView(APIView): """Synchronous chat endpoint (full response at once).""" permission_classes = [IsAuthenticated] def post(self, request): try: # Parse request session_id = request.data.get('session_id') user_message = request.data.get('message') if not user_message: return Response( {"error": "Message is required"}, status=status.HTTP_400_BAD_REQUEST ) # Get or create session if session_id: session = ChatSession.objects.get(id=session_id, user=request.user) else: session = ChatSession.objects.create( user=request.user, title=user_message[:100] ) # Save user message ChatMessage.objects.create( session=session, role='user', content=user_message ) # Get chat history for context messages = [] for msg in session.messages.all(): messages.append({ "role": msg.role, "content": msg.content }) # Generate response provider = LLMFactory.create_provider() response_text = provider.generate_response( messages=messages, temperature=0.7, max_tokens=1024 ) # Save assistant response assistant_message = ChatMessage.objects.create( session=session, role='assistant', content=response_text ) return Response({ "session_id": session.id, "response": response_text, "message_id": assistant_message.id }, status=status.HTTP_201_CREATED) except ChatSession.DoesNotExist: return Response( {"error": "Session not found"}, status=status.HTTP_404_NOT_FOUND ) except Exception as e: logger.error(f"Chat error: {str(e)}") return Response( {"error": "An error occurred during chat processing"}, status=status.HTTP_500_INTERNAL_SERVER_ERROR ) ```
Nun ein asynchroner, streaming Endpoint:
```python # views.py - continued from django.http import StreamingHttpResponse import asyncio
class ChatStreamView(APIView): """Streaming chat endpoint.""" permission_classes = [IsAuthenticated] def post(self, request): try: session_id = request.data.get('session_id') user_message = request.data.get('message') if not user_message: return Response( {"error": "Message is required"}, status=status.HTTP_400_BAD_REQUEST ) # Get or create session if session_id: session = ChatSession.objects.get(id=session_id, user=request.user) else: session = ChatSession.objects.create( user=request.user, title=user_message[:100] ) # Save user message ChatMessage.objects.create( session=session, role='user', content=user_message ) # Prepare messages for LLM messages = [] for msg in session.messages.all(): messages.append({ "role": msg.role, "content": msg.content }) # Stream generator function def stream_response(): provider = LLMFactory.create_provider() full_response = "" try: for chunk in provider.stream_response( messages=messages, temperature=0.7, max_tokens=1024 ): full_response += chunk yield f"data: {json.dumps({'chunk': chunk})}\n\n" # Save complete response ChatMessage.objects.create( session=session, role='assistant', content=full_response ) yield f"data: {json.dumps({'done': True, 'session_id': session.id})}\n\n" except Exception as e: logger.error(f"Streaming error: {str(e)}") yield f"data: {json.dumps({'error': 'Streaming failed'})}\n\n" return StreamingHttpResponse( stream_response(), content_type='text/event-stream', status=200 ) except ChatSession.DoesNotExist: return Response( {"error": "Session not found"}, status=status.HTTP_404_NOT_FOUND ) except Exception as e: logger.error(f"Streaming setup error: {str(e)}") return Response( {"error": "An error occurred"}, status=status.HTTP_500_INTERNAL_SERVER_ERROR ) ```
URLs registrieren:
```python # urls.py from django.urls import path from . import views
urlpatterns = [ path('chat/', views.ChatMessageView.as_view(), name='chat'), path('chat/stream/', views.ChatStreamView.as_view(), name='chat-stream'), ] ```
Error Handling und Rate Limiting
Zwei wichtige Production-Features: Error Handling und Rate Limiting.
Error Handling ist essenziell, weil LLM APIs fehlschlagen können (Quota, Timeouts, etc.):
```python # services/llm_service.py class LLMException(Exception): """Base exception for LLM errors.""" pass
class RateLimitException(LLMException): """Rate limit exceeded.""" pass
class APITimeoutException(LLMException): """API request timed out.""" pass
# In AnthropicProvider: def generate_response(self, messages, temperature=0.7, max_tokens=1024): try: response = self.client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=max_tokens, temperature=temperature, messages=messages, timeout=30 ) return response.content[0].text except Exception as e: error_str = str(e) if "rate limit" in error_str.lower(): logger.warning("Rate limit hit") raise RateLimitException("Rate limit exceeded") from e if "timeout" in error_str.lower(): logger.warning("API timeout") raise APITimeoutException("API request timed out") from e logger.error(f"Unexpected error: {error_str}") raise LLMException("LLM API error") from e ```
Und Rate Limiting mit django-ratelimit:
```bash pip install django-ratelimit ```
```python # views.py from django_ratelimit.decorators import ratelimit
class ChatMessageView(APIView): permission_classes = [IsAuthenticated] def post(self, request): # Rate limiting: 10 requests per hour per user from django_ratelimit.decorators import ratelimit from functools import wraps key = f"user:{request.user.id}" rate = '10/h' # 10 requests per hour # Simple rate limit check from django.core.cache import cache cache_key = f"chat_ratelimit:{key}" request_count = cache.get(cache_key, 0) if request_count >= 10: return Response( {"error": "Rate limit exceeded. Max 10 requests per hour."}, status=status.HTTP_429_TOO_MANY_REQUESTS ) # Rest of the logic... # At the end: cache.set(cache_key, request_count + 1, 3600) # Expire after 1 hour ```
Tests: Sicherstellen, dass alles funktioniert
Guter Code braucht Tests:
```python # tests.py from django.test import TestCase, Client from django.contrib.auth.models import User from .models import ChatSession, ChatMessage from .services.llm_service import LLMFactory from unittest.mock import patch, MagicMock
class ChatAPITest(TestCase): def setUp(self): self.user = User.objects.create_user( username='testuser', password='testpass' ) self.client = Client() self.client.login(username='testuser', password='testpass') @patch('services.llm_service.AnthropicProvider.generate_response') def test_chat_endpoint(self, mock_response): mock_response.return_value = "This is a test response" response = self.client.post( '/api/chat/', { 'message': 'Hello, how are you?' }, content_type='application/json' ) self.assertEqual(response.status_code, 201) self.assertIn('response', response.json()) self.assertEqual(response.json()['response'], "This is a test response") # Check that messages were saved session_id = response.json()['session_id'] session = ChatSession.objects.get(id=session_id) self.assertEqual(session.messages.count(), 2) # user + assistant def test_chat_requires_authentication(self): self.client.logout() response = self.client.post('/api/chat/', {}) self.assertEqual(response.status_code, 401) @patch('services.llm_service.AnthropicProvider.generate_response') def test_chat_error_handling(self, mock_response): mock_response.side_effect = Exception("API error") response = self.client.post( '/api/chat/', {'message': 'Test'}, content_type='application/json' ) self.assertEqual(response.status_code, 500) self.assertIn('error', response.json()) ```
Run tests mit:
```bash python manage.py test ```
OpenAI vs. Anthropic: Ein direkter Vergleich
Zum Abschluss: Welcher Provider? Hier ein Überblick:
Anthropic Claude - Stärken: Sehr gute Qualität, gute Preise, große Context Window (200k tokens) - Schwächen: Weniger Integrationen als OpenAI - Best for: Content-Generation, RAG-Systeme, lange Kontexte - Price: ca. $3 pro 1M input tokens, $15 pro 1M output tokens
OpenAI GPT-4 - Stärken: Beste Qualität, viele Integrationen, stabiler - Schwächen: Teurer, kleinere Context Window - Best for: Komplexe Reasoning-Tasks, Vision (GPT-4 Vision) - Price: ca. $30 pro 1M input tokens, $60 pro 1M output tokens
In der Praxis: Testet beide mit euren Use Cases. Die meisten Unternehmen nutzen sowohl OpenAI als auch Claude, je nach Anforderung. Das ist genau das, wofür die abstrahierte Service Layer da ist!
Fazit
Ihr habt jetzt ein komplettes System, um LLM-APIs in Django zu nutzen: Eine saubere Service Layer, Error Handling, Streaming, Tests und zwei verschiedene Provider, die ihr austauschen könnt.
Das ist nicht nur besser als Copy-Paste Code aus ChatGPT — es ist auch maintainbar, testbar und skalierbar. Wenn die Anforderungen sich ändern, passt ihr einfach die Service Layer an, nicht überall Euren View-Code.
Nächste Schritte: Schaut euch RAG-Systeme an, um eure Daten in die LLM-Prompts zu integrieren, oder baut ein React-Frontend, das das Streaming nutzt. Oder kontaktiert e-laborat für ein Code Review oder einen KI-Readiness-Check.