Checklists

FastAPI implementation checklist

This checklist provides a technical roadmap for moving FastAPI applications from development to production. It focuses on async performance, security hardening, and specific patterns for high-concurrency workloads like LLM streaming.

Progress0 / 25 complete (0%)

Application Architecture

0/5

Modularize Routes with APIRouter
critical
Verify that all endpoints are grouped into logical modules using APIRouter and included in the main app via app.include_router() to prevent a monolithic main.py file.
Implement Pydantic BaseSettings
critical
Ensure all configuration is managed through Pydantic's BaseSettings to enforce type validation on environment variables and provide default values.
Standardize Exception Handlers
recommended
Register global exception handlers for Starlette's HTTPException and Pydantic's ValidationError to ensure all error responses follow a consistent JSON schema.
Use Dependency Injection for Services
recommended
Validate that database sessions, third-party clients, and business logic services are injected using FastAPI's Depends() to facilitate mocking during unit tests.
Apply Strict Pydantic Types
critical
Review all request/reponse models to ensure they use strict types (e.g., constr, PositiveInt) and Field constraints to prevent malformed data processing.

Database and Async Performance

0/5

Configure Async Database Drivers
critical
Verify the use of async-compatible drivers (e.g., asyncpg for PostgreSQL) and ensure every database interaction uses the 'await' keyword.
Optimize Connection Pooling
critical
Set SQLAlchemy pool_size and max_overflow parameters based on the number of Uvicorn workers to prevent 'Too many connections' errors.
Automate Alembic Migrations
critical
Confirm that database schema changes are strictly managed via Alembic versions and that migrations are executed as a pre-deployment step.
Eliminate Blocking Sync Calls
critical
Audit the codebase for synchronous I/O libraries (like 'requests' or 'time.sleep') and replace them with 'httpx' or 'asyncio.sleep' to avoid stalling the event loop.
Implement Redis Caching
recommended
Apply Redis caching to high-latency read endpoints to reduce database load and improve response times for static or semi-static data.

Security and Authentication

0/5

Secure CORS Middleware
critical
Restrict the 'allow_origins' list in CORSMiddleware to specific production domains instead of using wildcards.
Rotate JWT Secret Keys
critical
Verify that JWT signing keys are managed via environment variables and use at least 256-bit entropy; ensure they are not committed to version control.
Set JWT Token Expiration
critical
Confirm that all issued access tokens have an 'exp' claim set to a short duration (e.g., 15-60 minutes) to minimize the impact of token leakage.
Apply Rate Limiting
recommended
Implement rate limiting on sensitive routes (auth, password reset) using a library like slowapi or a Redis-backed counter to prevent brute-force attacks.
Disable Default Docs in Production
recommended
Set docs_url=None and redoc_url=None in the FastAPI constructor when running in production to prevent exposing the API specification to the public.

LLM Integration and Streaming

0/5

Use StreamingResponse for LLMs
critical
Implement StreamingResponse for all LLM inference endpoints to return tokens as they are generated, preventing gateway timeouts for long completions.
Limit Concurrent Inference Calls
recommended
Use an asyncio.Semaphore to cap the number of simultaneous calls to external LLM providers or local models to manage cost and memory usage.
Implement HTTPX Connection Pooling
critical
Initialize a single global httpx.AsyncClient instance to reuse TCP connections for external API calls, rather than creating a new client per request.
Configure Request Timeouts
critical
Set explicit connect, read, and write timeouts on all external HTTP clients to prevent the application from hanging on unresponsive upstream AI services.
Validate Stream Termination
recommended
Ensure the application correctly handles client disconnects during a stream to immediately stop upstream inference and free up resources.

Deployment and Observability

0/5

Configure Uvicorn Workers
critical
Run Uvicorn via Gunicorn using the UvicornWorker class, setting the number of workers to (2 x CPU cores + 1) for optimal resource utilization.
Enable Proxy Headers
critical
Include ProxyHeadersMiddleware if the app is behind Nginx or Traefik to ensure correct client IP and protocol detection.
Expose Health Check Endpoints
critical
Create a /health route that verifies connectivity to the database and cache, used by orchestrators like Kubernetes for liveness/readiness probes.
Standardize JSON Logging
recommended
Configure the logging system to output structured JSON formatted logs to stdout for easier parsing by log aggregation tools.
Trace Requests with IDs
recommended
Implement middleware to inject a unique X-Request-ID into every request's context and include it in all logs for distributed tracing.