Die einfache Integration einer LLM-API in Django ist eigentlich trivial — ein HTTP-Request, ein JSON-Response, fertig. Aber produktiver Code ist mehr: Error Handling, Streaming, Rate Limiting, Logging, User-Context. Dieser Artikel zeigt euch, wie ihr LLM-APIs (OpenAI ChatGPT, Anthropic Claude) in ein Django-Projekt integriert und dabei Best Practices von Anfang an einbaut. Wir bauen ein echtes Chat-System Step-by-Step auf, mit vollständigem Error Handling, Tests und vergleichen zwei der populärsten APIs.
Setup: Dependencies und Konfiguration
Zuerst die Dependencies. Wir nutzen die offiziellen SDKs:
pip install openai anthropic django-environ python-dotenv
Eure `settings.py` sollte API-Keys sicher laden:
# settings.py import os from pathlib import Path from dotenv import load_dotenv
load_dotenv()
BASE_DIR = Path(__file__).resolve().parent.parent
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') ANTHROPIC_API_KEY = os.getenv('ANTHROPIC_API_KEY')
# Timeout settings for LLM API calls LLM_API_TIMEOUT = int(os.getenv('LLM_API_TIMEOUT', '30')) LLM_MAX_TOKENS = int(os.getenv('LLM_MAX_TOKENS', '1024')) LLM_TEMPERATURE = float(os.getenv('LLM_TEMPERATURE', '0.7'))
# Model selection LLM_PROVIDER = os.getenv('LLM_PROVIDER', 'anthropic') # 'openai' or 'anthropic' LLM_MODEL = os.getenv('LLM_MODEL', 'claude-3-5-sonnet-20241022')
Und eure `.env`:
OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... LLM_PROVIDER=anthropic LLM_MODEL=claude-3-5-sonnet-20241022 LLM_TIMEOUT=30 LLM_MAX_TOKENS=1024
Wichtig: `.env` ins `.gitignore`!
Models: Chat-Historie speichern
Ein minimales Model für Chat-Nachrichten:
# models.py from django.db import models from django.contrib.auth.models import User from django.utils import timezone
class ChatSession(models.Model): user = models.ForeignKey(User, title = models.CharField(max_length=255, blank=True) created_at = models.DateTimeField(auto_now_add=True) updated_at = models.DateTimeField(auto_now=True) class Meta: ordering = ['-updated_at'] def __str__(self): return self.title or f"Chat {self.created_at.strftime('%Y-%m-%d %H:%M')}"
class ChatMessage(models.Model): ROLE_CHOICES = [ ('user', 'User'), ('assistant', 'Assistant'), ] session = models.ForeignKey(ChatSession, related_name='messages') role = models.CharField(max_length=10, choices=ROLE_CHOICES) content = models.TextField() created_at = models.DateTimeField(auto_now_add=True) class Meta: ordering = ['created_at'] def __str__(self): return f"{self.role}: {self.content[:50]}..."
Service Layer: Abstraktion der LLM-Provider
Wichtig: Niemals API-Calls direkt in Views machen. Das ist die erste Lektion. Wir bauen eine Service Layer, die die Provider abstrahiert:
# services/llm_service.py from abc import ABC, abstractmethod from typing import Optional, Iterator import logging
logger = logging.getLogger(__name__)
class LLMProvider(ABC): """Abstract base class for LLM providers.""" @abstractmethod def generate_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> str: """Generate a single response.""" pass @abstractmethod def stream_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> Iterator[str]: """Stream a response token by token.""" pass
Nun die Anthropic-Implementation:
# services/llm_service.py - continued from anthropic import Anthropic import os
class AnthropicProvider(LLMProvider): def __init__(self, api_key: Optional[str] = None): self.api_key = api_key or os.getenv('ANTHROPIC_API_KEY') self.client = Anthropic(api_key=self.api_key) def generate_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> str: try: response = self.client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=max_tokens, temperature=temperature, messages=messages, ) return response.content[0].text except Exception as e: logger.error(f"Anthropic API error: {str(e)}") raise def stream_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> Iterator[str]: try: with self.client.messages.stream( model="claude-3-5-sonnet-20241022", max_tokens=max_tokens, temperature=temperature, messages=messages, ) as stream: for text in stream.text_stream: yield text except Exception as e: logger.error(f"Anthropic streaming error: {str(e)}") raise
Und die OpenAI-Implementation:
# services/llm_service.py - continued from openai import OpenAI
class OpenAIProvider(LLMProvider): def __init__(self, api_key: Optional[str] = None): self.api_key = api_key or os.getenv('OPENAI_API_KEY') self.client = OpenAI(api_key=self.api_key) def generate_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> str: try: response = self.client.chat.completions.create( model="gpt-4-turbo", max_tokens=max_tokens, temperature=temperature, messages=messages, ) return response.choices[0].message.content except Exception as e: logger.error(f"OpenAI API error: {str(e)}") raise def stream_response( self, messages: list, temperature: float = 0.7, max_tokens: int = 1024, ) -> Iterator[str]: try: stream = self.client.chat.completions.create( model="gpt-4-turbo", max_tokens=max_tokens, temperature=temperature, messages=messages, stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: yield chunk.choices[0].delta.content except Exception as e: logger.error(f"OpenAI streaming error: {str(e)}") raise
Und ein Factory, um den Provider zu wählen:
# services/llm_service.py - continued from django.conf import settings
class LLMFactory: """Factory for creating LLM provider instances.""" _providers = { 'anthropic': AnthropicProvider, 'openai': OpenAIProvider, } @classmethod def create_provider(cls, provider_name: Optional[str] = None) -> LLMProvider: """Create an LLM provider instance.""" provider_name = provider_name or settings.LLM_PROVIDER if provider_name not in cls._providers: raise ValueError(f"Unknown provider: {provider_name}") return cls._providers[provider_name]()
Views: Synchrone und asynchrone Chat-Endpoints
Nun bauen wir die Views. Zuerst ein einfacher synchroner Endpoint:
# views.py from django.http import JsonResponse from django.views.decorators.http import require_http_methods from django.views.decorators.csrf import csrf_exempt from django.contrib.auth.decorators import login_required from rest_framework.views import APIView from rest_framework.response import Response from rest_framework.permissions import IsAuthenticated from rest_framework import status import json
from .models import ChatSession, ChatMessage from .services.llm_service import LLMFactory import logging
logger = logging.getLogger(__name__)
class ChatMessageView(APIView): """Synchronous chat endpoint (full response at once).""" permission_classes = [IsAuthenticated] def post(self, request): try: # Parse request session_id = request.data.get('session_id') user_message = request.data.get('message') if not user_message: return Response( {"error": "Message is required"}, status=status.HTTP_400_BAD_REQUEST ) # Get or create session if session_id: session = ChatSession.objects.get(id=session_id, user=request.user) else: session = ChatSession.objects.create( user=request.user, title=user_message[:100] ) # Save user message ChatMessage.objects.create( session=session, role='user', content=user_message ) # Get chat history for context messages = [] for msg in session.messages.all(): messages.append({ "role": msg.role, "content": msg.content }) # Generate response provider = LLMFactory.create_provider() response_text = provider.generate_response( messages=messages, temperature=0.7, max_tokens=1024 ) # Save assistant response assistant_message = ChatMessage.objects.create( session=session, role='assistant', content=response_text ) return Response({ "session_id": session.id, "response": response_text, "message_id": assistant_message.id }, status=status.HTTP_201_CREATED) except ChatSession.DoesNotExist: return Response( {"error": "Session not found"}, status=status.HTTP_404_NOT_FOUND ) except Exception as e: logger.error(f"Chat error: {str(e)}") return Response( {"error": "An error occurred during chat processing"}, status=status.HTTP_500_INTERNAL_SERVER_ERROR )
Nun ein asynchroner, streaming Endpoint:
# views.py - continued from django.http import StreamingHttpResponse import asyncio
class ChatStreamView(APIView): """Streaming chat endpoint.""" permission_classes = [IsAuthenticated] def post(self, request): try: session_id = request.data.get('session_id') user_message = request.data.get('message') if not user_message: return Response( {"error": "Message is required"}, status=status.HTTP_400_BAD_REQUEST ) # Get or create session if session_id: session = ChatSession.objects.get(id=session_id, user=request.user) else: session = ChatSession.objects.create( user=request.user, title=user_message[:100] ) # Save user message ChatMessage.objects.create( session=session, role='user', content=user_message ) # Prepare messages for LLM messages = [] for msg in session.messages.all(): messages.append({ "role": msg.role, "content": msg.content }) # Stream generator function def stream_response(): provider = LLMFactory.create_provider() full_response = "" try: for chunk in provider.stream_response( messages=messages, temperature=0.7, max_tokens=1024 ): full_response += chunk yield f"data: {json.dumps({'chunk': chunk})}\n\n" # Save complete response ChatMessage.objects.create( session=session, role='assistant', content=full_response ) yield f"data: {json.dumps({'done': True, 'session_id': session.id})}\n\n" except Exception as e: logger.error(f"Streaming error: {str(e)}") yield f"data: {json.dumps({'error': 'Streaming failed'})}\n\n" return StreamingHttpResponse( stream_response(), content_type='text/event-stream', status=200 ) except ChatSession.DoesNotExist: return Response( {"error": "Session not found"}, status=status.HTTP_404_NOT_FOUND ) except Exception as e: logger.error(f"Streaming setup error: {str(e)}") return Response( {"error": "An error occurred"}, status=status.HTTP_500_INTERNAL_SERVER_ERROR )
URLs registrieren:
# urls.py from django.urls import path from . import views
urlpatterns = [ path('chat/', views.ChatMessageView.as_view(), name='chat'), path('chat/stream/', views.ChatStreamView.as_view(), name='chat-stream'), ]
Error Handling und Rate Limiting
Zwei wichtige Production-Features: Error Handling und Rate Limiting.
Error Handling ist essenziell, weil LLM APIs fehlschlagen können (Quota, Timeouts, etc.):
# services/llm_service.py class LLMException(Exception): """Base exception for LLM errors.""" pass
class RateLimitException(LLMException): """Rate limit exceeded.""" pass
class APITimeoutException(LLMException): """API request timed out.""" pass
# In AnthropicProvider: def generate_response(self, messages, temperature=0.7, max_tokens=1024): try: response = self.client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=max_tokens, temperature=temperature, messages=messages, timeout=30 ) return response.content[0].text except Exception as e: error_str = str(e) if "rate limit" in error_str.lower(): logger.warning("Rate limit hit") raise RateLimitException("Rate limit exceeded") from e if "timeout" in error_str.lower(): logger.warning("API timeout") raise APITimeoutException("API request timed out") from e logger.error(f"Unexpected error: {error_str}") raise LLMException("LLM API error") from e
Und Rate Limiting mit django-ratelimit:
pip install django-ratelimit
# views.py from django_ratelimit.decorators import ratelimit
class ChatMessageView(APIView): permission_classes = [IsAuthenticated] def post(self, request): # Rate limiting: 10 requests per hour per user from django_ratelimit.decorators import ratelimit from functools import wraps key = f"user:{request.user.id}" rate = '10/h' # 10 requests per hour # Simple rate limit check from django.core.cache import cache cache_key = f"chat_ratelimit:{key}" request_count = cache.get(cache_key, 0) if request_count >= 10: return Response( {"error": "Rate limit exceeded. Max 10 requests per hour."}, status=status.HTTP_429_TOO_MANY_REQUESTS ) # Rest of the logic... # At the end: cache.set(cache_key, request_count + 1, 3600) # Expire after 1 hour
Tests: Sicherstellen, dass alles funktioniert
Guter Code braucht Tests:
# tests.py from django.test import TestCase, Client from django.contrib.auth.models import User from .models import ChatSession, ChatMessage from .services.llm_service import LLMFactory from unittest.mock import patch, MagicMock
class ChatAPITest(TestCase): def setUp(self): self.user = User.objects.create_user( username='testuser', password='testpass' ) self.client = Client() self.client.login(username='testuser', password='testpass') @patch('services.llm_service.AnthropicProvider.generate_response') def test_chat_endpoint(self, mock_response): mock_response.return_value = "This is a test response" response = self.client.post( '/api/chat/', { 'message': 'Hello, how are you?' }, content_type='application/json' ) self.assertEqual(response.status_code, 201) self.assertIn('response', response.json()) self.assertEqual(response.json()['response'], "This is a test response") # Check that messages were saved session_id = response.json()['session_id'] session = ChatSession.objects.get(id=session_id) self.assertEqual(session.messages.count(), 2) # user + assistant def test_chat_requires_authentication(self): self.client.logout() response = self.client.post('/api/chat/', {}) self.assertEqual(response.status_code, 401) @patch('services.llm_service.AnthropicProvider.generate_response') def test_chat_error_handling(self, mock_response): mock_response.side_effect = Exception("API error") response = self.client.post( '/api/chat/', {'message': 'Test'}, content_type='application/json' ) self.assertEqual(response.status_code, 500) self.assertIn('error', response.json())
Run tests mit:
python manage.py test
OpenAI vs. Anthropic: Ein direkter Vergleich
Zum Abschluss: Welcher Provider? Hier ein Überblick:
Anthropic Claude - Stärken: Sehr gute Qualität, gute Preise, große Context Window (200k tokens) - Schwächen: Weniger Integrationen als OpenAI - Best for: Content-Generation, RAG-Systeme, lange Kontexte - Price: ca. $3 pro 1M input tokens, $15 pro 1M output tokens
OpenAI GPT-4 - Stärken: Beste Qualität, viele Integrationen, stabiler - Schwächen: Teurer, kleinere Context Window - Best for: Komplexe Reasoning-Tasks, Vision (GPT-4 Vision) - Price: ca. $30 pro 1M input tokens, $60 pro 1M output tokens
In der Praxis: Testet beide mit euren Use Cases. Die meisten Unternehmen nutzen sowohl OpenAI als auch Claude, je nach Anforderung. Das ist genau das, wofür die abstrahierte Service Layer da ist!
KI-Beratung für Ihr Unternehmen
e-laborat hilft Mittelständlern bei der KI-Einführung — pragmatisch, praxisnah, mit Berliner Startup-Mentalität.
Erstgespräch vereinbaren →Fazit
Ihr habt jetzt ein komplettes System, um LLM-APIs in Django zu nutzen: Eine saubere Service Layer, Error Handling, Streaming, Tests und zwei verschiedene Provider, die ihr austauschen könnt.
Das ist nicht nur besser als Copy-Paste Code aus ChatGPT — es ist auch maintainbar, testbar und skalierbar. Wenn die Anforderungen sich ändern, passt ihr einfach die Service Layer an, nicht überall Euren View-Code.
Nächste Schritte: Schaut euch RAG-Systeme an, um eure Daten in die LLM-Prompts zu integrieren, oder baut ein React-Frontend, das das Streaming nutzt. Oder kontaktiert e-laborat für ein Code Review oder einen KI-Readiness-Check.