FastAPI implementation checklist
This checklist provides a technical roadmap for moving FastAPI applications from development to production. It focuses on async performance, security hardening, and specific patterns for high-concurrency workloads like LLM streaming.
Application Architecture
0/5Modularize Routes with APIRouter
criticalVerify that all endpoints are grouped into logical modules using APIRouter and included in the main app via app.include_router() to prevent a monolithic main.py file.
Implement Pydantic BaseSettings
criticalEnsure all configuration is managed through Pydantic's BaseSettings to enforce type validation on environment variables and provide default values.
Standardize Exception Handlers
recommendedRegister global exception handlers for Starlette's HTTPException and Pydantic's ValidationError to ensure all error responses follow a consistent JSON schema.
Use Dependency Injection for Services
recommendedValidate that database sessions, third-party clients, and business logic services are injected using FastAPI's Depends() to facilitate mocking during unit tests.
Apply Strict Pydantic Types
criticalReview all request/reponse models to ensure they use strict types (e.g., constr, PositiveInt) and Field constraints to prevent malformed data processing.
Database and Async Performance
0/5Configure Async Database Drivers
criticalVerify the use of async-compatible drivers (e.g., asyncpg for PostgreSQL) and ensure every database interaction uses the 'await' keyword.
Optimize Connection Pooling
criticalSet SQLAlchemy pool_size and max_overflow parameters based on the number of Uvicorn workers to prevent 'Too many connections' errors.
Automate Alembic Migrations
criticalConfirm that database schema changes are strictly managed via Alembic versions and that migrations are executed as a pre-deployment step.
Eliminate Blocking Sync Calls
criticalAudit the codebase for synchronous I/O libraries (like 'requests' or 'time.sleep') and replace them with 'httpx' or 'asyncio.sleep' to avoid stalling the event loop.
Implement Redis Caching
recommendedApply Redis caching to high-latency read endpoints to reduce database load and improve response times for static or semi-static data.
Security and Authentication
0/5Secure CORS Middleware
criticalRestrict the 'allow_origins' list in CORSMiddleware to specific production domains instead of using wildcards.
Rotate JWT Secret Keys
criticalVerify that JWT signing keys are managed via environment variables and use at least 256-bit entropy; ensure they are not committed to version control.
Set JWT Token Expiration
criticalConfirm that all issued access tokens have an 'exp' claim set to a short duration (e.g., 15-60 minutes) to minimize the impact of token leakage.
Apply Rate Limiting
recommendedImplement rate limiting on sensitive routes (auth, password reset) using a library like slowapi or a Redis-backed counter to prevent brute-force attacks.
Disable Default Docs in Production
recommendedSet docs_url=None and redoc_url=None in the FastAPI constructor when running in production to prevent exposing the API specification to the public.
LLM Integration and Streaming
0/5Use StreamingResponse for LLMs
criticalImplement StreamingResponse for all LLM inference endpoints to return tokens as they are generated, preventing gateway timeouts for long completions.
Limit Concurrent Inference Calls
recommendedUse an asyncio.Semaphore to cap the number of simultaneous calls to external LLM providers or local models to manage cost and memory usage.
Implement HTTPX Connection Pooling
criticalInitialize a single global httpx.AsyncClient instance to reuse TCP connections for external API calls, rather than creating a new client per request.
Configure Request Timeouts
criticalSet explicit connect, read, and write timeouts on all external HTTP clients to prevent the application from hanging on unresponsive upstream AI services.
Validate Stream Termination
recommendedEnsure the application correctly handles client disconnects during a stream to immediately stop upstream inference and free up resources.
Deployment and Observability
0/5Configure Uvicorn Workers
criticalRun Uvicorn via Gunicorn using the UvicornWorker class, setting the number of workers to (2 x CPU cores + 1) for optimal resource utilization.
Enable Proxy Headers
criticalInclude ProxyHeadersMiddleware if the app is behind Nginx or Traefik to ensure correct client IP and protocol detection.
Expose Health Check Endpoints
criticalCreate a /health route that verifies connectivity to the database and cache, used by orchestrators like Kubernetes for liveness/readiness probes.
Standardize JSON Logging
recommendedConfigure the logging system to output structured JSON formatted logs to stdout for easier parsing by log aggregation tools.
Trace Requests with IDs
recommendedImplement middleware to inject a unique X-Request-ID into every request's context and include it in all logs for distributed tracing.