Checklists

FastAPI implementation checklist

This checklist provides a technical roadmap for moving FastAPI applications from development to production. It focuses on async performance, security hardening, and specific patterns for high-concurrency workloads like LLM streaming.

Progress0 / 25 complete (0%)

Application Architecture

0/5
  • Modularize Routes with APIRouter

    critical

    Verify that all endpoints are grouped into logical modules using APIRouter and included in the main app via app.include_router() to prevent a monolithic main.py file.

  • Implement Pydantic BaseSettings

    critical

    Ensure all configuration is managed through Pydantic's BaseSettings to enforce type validation on environment variables and provide default values.

  • Standardize Exception Handlers

    recommended

    Register global exception handlers for Starlette's HTTPException and Pydantic's ValidationError to ensure all error responses follow a consistent JSON schema.

  • Use Dependency Injection for Services

    recommended

    Validate that database sessions, third-party clients, and business logic services are injected using FastAPI's Depends() to facilitate mocking during unit tests.

  • Apply Strict Pydantic Types

    critical

    Review all request/reponse models to ensure they use strict types (e.g., constr, PositiveInt) and Field constraints to prevent malformed data processing.

Database and Async Performance

0/5
  • Configure Async Database Drivers

    critical

    Verify the use of async-compatible drivers (e.g., asyncpg for PostgreSQL) and ensure every database interaction uses the 'await' keyword.

  • Optimize Connection Pooling

    critical

    Set SQLAlchemy pool_size and max_overflow parameters based on the number of Uvicorn workers to prevent 'Too many connections' errors.

  • Automate Alembic Migrations

    critical

    Confirm that database schema changes are strictly managed via Alembic versions and that migrations are executed as a pre-deployment step.

  • Eliminate Blocking Sync Calls

    critical

    Audit the codebase for synchronous I/O libraries (like 'requests' or 'time.sleep') and replace them with 'httpx' or 'asyncio.sleep' to avoid stalling the event loop.

  • Implement Redis Caching

    recommended

    Apply Redis caching to high-latency read endpoints to reduce database load and improve response times for static or semi-static data.

Security and Authentication

0/5
  • Secure CORS Middleware

    critical

    Restrict the 'allow_origins' list in CORSMiddleware to specific production domains instead of using wildcards.

  • Rotate JWT Secret Keys

    critical

    Verify that JWT signing keys are managed via environment variables and use at least 256-bit entropy; ensure they are not committed to version control.

  • Set JWT Token Expiration

    critical

    Confirm that all issued access tokens have an 'exp' claim set to a short duration (e.g., 15-60 minutes) to minimize the impact of token leakage.

  • Apply Rate Limiting

    recommended

    Implement rate limiting on sensitive routes (auth, password reset) using a library like slowapi or a Redis-backed counter to prevent brute-force attacks.

  • Disable Default Docs in Production

    recommended

    Set docs_url=None and redoc_url=None in the FastAPI constructor when running in production to prevent exposing the API specification to the public.

LLM Integration and Streaming

0/5
  • Use StreamingResponse for LLMs

    critical

    Implement StreamingResponse for all LLM inference endpoints to return tokens as they are generated, preventing gateway timeouts for long completions.

  • Limit Concurrent Inference Calls

    recommended

    Use an asyncio.Semaphore to cap the number of simultaneous calls to external LLM providers or local models to manage cost and memory usage.

  • Implement HTTPX Connection Pooling

    critical

    Initialize a single global httpx.AsyncClient instance to reuse TCP connections for external API calls, rather than creating a new client per request.

  • Configure Request Timeouts

    critical

    Set explicit connect, read, and write timeouts on all external HTTP clients to prevent the application from hanging on unresponsive upstream AI services.

  • Validate Stream Termination

    recommended

    Ensure the application correctly handles client disconnects during a stream to immediately stop upstream inference and free up resources.

Deployment and Observability

0/5
  • Configure Uvicorn Workers

    critical

    Run Uvicorn via Gunicorn using the UvicornWorker class, setting the number of workers to (2 x CPU cores + 1) for optimal resource utilization.

  • Enable Proxy Headers

    critical

    Include ProxyHeadersMiddleware if the app is behind Nginx or Traefik to ensure correct client IP and protocol detection.

  • Expose Health Check Endpoints

    critical

    Create a /health route that verifies connectivity to the database and cache, used by orchestrators like Kubernetes for liveness/readiness probes.

  • Standardize JSON Logging

    recommended

    Configure the logging system to output structured JSON formatted logs to stdout for easier parsing by log aggregation tools.

  • Trace Requests with IDs

    recommended

    Implement middleware to inject a unique X-Request-ID into every request's context and include it in all logs for distributed tracing.