Guides

Building Django with open-source tools

This guide details the implementation of a production-ready Django architecture for serving machine learning predictions. It focuses on offloading model inference to Celery workers to prevent blocking the Django request-response cycle, ensuring the application remains responsive during heavy computation.

60 minutes5 steps
1

Configure Celery and Redis Broker

Initialize Celery within the Django project to handle background tasks. This requires creating a celery.py file in the project configuration directory and updating the __init__.py file to ensure the app loads on startup.

config/celery.py
import os
from celery import Celery

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'config.settings')
app = Celery('ml_project')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()

⚠ Common Pitfalls

  • Failure to set the DJANGO_SETTINGS_MODULE environment variable leads to ImproperlyConfigured errors.
  • Using a local Redis instance without password protection in production.
2

Implement a Thread-Safe Model Loader

To avoid the overhead of loading large ML models from disk for every prediction, implement a singleton loader. This ensures the model is loaded into memory once when the Celery worker starts.

ml/loader.py
import joblib
from django.conf import settings

class ModelLoader:
    _model = None

    @classmethod
    def get_model(cls):
        if cls._model is None:
            cls._model = joblib.load(settings.ML_MODEL_PATH)
        return cls._model

⚠ Common Pitfalls

  • Loading models inside the Django view instead of the Celery worker, causing high memory usage in the web server process.
  • Not accounting for memory limits on worker nodes when loading multiple large models.
3

Define the Inference Celery Task

Create a task that takes input data, uses the ModelLoader to get the model, and performs the prediction. Results should be stored in the database or returned via the Celery result backend.

ml/tasks.py
from celery import shared_task
from .loader import ModelLoader

@shared_task(bind=True)
def predict_task(self, input_data):
    model = ModelLoader.get_model()
    prediction = model.predict([input_data])
    return prediction.tolist()

⚠ Common Pitfalls

  • Attempting to serialize NumPy arrays directly in JSON; always convert to standard Python lists using .tolist().
  • Passing complex Django model instances to tasks instead of primary keys (PKs).
4

Create the Prediction Trigger View

Implement a Django REST Framework view that receives user input, validates it, and triggers the Celery task. The view returns the task ID immediately so the client can track progress.

api/views.py
from rest_framework.views import APIView
from rest_framework.response import Response
from ml.tasks import predict_task

class PredictionTriggerView(APIView):
    def post(self, request):
        data = request.data.get('features')
        task = predict_task.delay(data)
        return Response({'task_id': task.id}, status=202)

⚠ Common Pitfalls

  • Using 200 OK instead of 202 Accepted for long-running processes.
  • Lack of input validation before passing data to the Celery task, which can crash workers.
5

Implement Result Retrieval with HTMX or Polling

Create an endpoint to check the status of the Celery task. If using HTMX, you can use the 'hx-get' and 'hx-trigger' attributes to poll this endpoint until the result is ready.

api/views.py
from celery.result import AsyncResult

class PredictionResultView(APIView):
    def get(self, request, task_id):
        result = AsyncResult(task_id)
        return Response({
            'status': result.status,
            'result': result.result if result.ready() else None
        })

⚠ Common Pitfalls

  • Infinite polling without a timeout or max retry limit on the frontend.
  • Exposing sensitive internal error traces through the result backend.

What you built

By decoupling ML inference from the request cycle using Celery and Redis, you ensure that your Django application remains performant. This architecture scales horizontally by adding more Celery workers as prediction demand increases, and avoids the common pitfall of blocking the web server's worker threads with CPU-intensive tasks.