Resources

100 Prompt Engineering resources for developers

This resource guide provides developers with the specific patterns, evaluation frameworks, and implementation tools required to move beyond basic chat interfaces and build reliable, production-grade LLM applications. It focuses on systematic prompt engineering, structured output enforcement, and regression testing.

Core Prompting Patterns and Techniques

1
Chain-of-Thought (CoT) Reasoning
beginnerhigh
Instruct the model to 'think step-by-step' before providing a final answer to improve performance on logic and math tasks.
2
Few-Shot In-Context Learning
beginnerhigh
Provide 3-5 high-quality examples of input-output pairs within the prompt to define specific formatting and stylistic requirements.
3
XML Tagging for Context Separation
beginnerstandard
Use tags like <context>, <instruction>, and <examples> to help models like Claude 3.5 Sonnet distinguish between metadata and task data.
4
Chain-of-Verification (CoVe)
advancedmedium
A multi-step process where the model generates a response, identifies its own potential errors, and verifies facts before final output.
5
Skeleton-of-Thought (SoT)
intermediatemedium
Prompt the model to generate a high-level outline first, then expand each section to reduce latency and improve structural coherence.
6
Negative Constraints and Guardrails
beginnerstandard
Explicitly list forbidden words, topics, or formats (e.g., 'Do not use markdown') to prevent unwanted output behavior.
7
Persona-Based System Prompts
beginnerstandard
Define a specific professional role (e.g., 'You are a Senior Site Reliability Engineer') to prime the model for specific technical jargon and tone.
8
Self-Consistency Sampling
advancedhigh
Generate multiple outputs for the same prompt and use a majority vote or LLM-as-a-judge to select the most accurate result.
9
Dynamic Context Window Management
intermediatehigh
Using RAG or sliding windows to pass only the most relevant 20-30% of document context to avoid the 'lost in the middle' phenomenon.
10
Output Formatting via Pydantic
intermediatehigh
Force the model to return valid JSON by providing a strict schema, often implemented via libraries like Instructor.

Evaluation and Observability Tools

1
Promptfoo CLI
intermediatehigh
A matrix testing tool that allows you to run prompts against multiple models and check outputs against predefined assertions.
2
LangSmith (LangChain)
intermediatehigh
A platform for tracing, debugging, and evaluating LLM applications with integrated dataset management for regression testing.
3
Braintrust
advancedhigh
An enterprise-grade stack for logging LLM calls, managing prompt versions, and running automated evaluation workflows.
4
DeepEval (Pytest for LLMs)
intermediatemedium
A Python framework for unit testing LLM outputs based on metrics like faithfulness, relevance, and hallucination scores.
5
Helicone
beginnerstandard
An open-source observability proxy that tracks costs, latency, and token usage for OpenAI and Anthropic requests.
6
PromptLayer
beginnermedium
A middleware for logging and versioning prompts, allowing developers to roll back to previous prompt iterations without code changes.
7
Giskard
advancedmedium
An open-source library for detecting vulnerabilities, hallucinations, and biases in LLM-based applications.
8
Honeycomb LLM Observability
advancedstandard
Distributed tracing for LLMs to understand the latency impact of each step in a complex multi-agent chain.
9
Weights & Biases Prompts
intermediatestandard
A suite of tools for visualizing and inspecting the execution flow of LLM chains and pipelines.
10
Arize Phoenix
advancedmedium
Open-source observability for RAG and LLM applications, focusing on embedding visualization and retrieval evaluation.

Structured Output and Tooling

1
Instructor Library
beginnerhigh
The industry standard for getting structured data (JSON) from LLMs using Pydantic models for Python or Zod for TS.
2
Vercel AI SDK (generateObject)
beginnerhigh
A unified interface for generating type-safe JSON objects across OpenAI, Anthropic, and Google Gemini models.
3
Outlines (by .txt)
advancedhigh
A library for neural text generation that uses regex and context-free grammars to guarantee 100% valid JSON or code.
4
LMQL (Language Model Query Language)
advancedmedium
A programming language that combines logic programming with LLM generation to constrain output at the token level.
5
Guidance (Microsoft)
intermediatemedium
A template language for controlling LLMs, allowing you to interleave generation, control flow, and fixed strings.
6
Marvin AI
beginnerstandard
A lightweight library that uses LLMs to power standard Python functions, handling the prompt engineering under the hood.
7
TypeChat
intermediatestandard
Microsoft's library for replacing complex prompt engineering with schema-based type definitions to guide model output.
8
Jsonformer
intermediatemedium
A library specifically designed to fill in the values of a JSON schema while the structure is pre-defined, saving tokens.
9
SGLang
advancedhigh
A fast backend and frontend for structured generation, optimized for high-throughput serving and complex control flow.
10
Portkey AI Gateway
intermediatemedium
A control plane to manage multiple LLMs with built-in retries, fallbacks, and load balancing across providers.

Core Prompting Patterns and Techniques

Chain-of-Thought (CoT) Reasoning

Few-Shot In-Context Learning

XML Tagging for Context Separation

Chain-of-Verification (CoVe)

Skeleton-of-Thought (SoT)

Negative Constraints and Guardrails

Persona-Based System Prompts

Self-Consistency Sampling

Dynamic Context Window Management

Output Formatting via Pydantic

Evaluation and Observability Tools

Promptfoo CLI

LangSmith (LangChain)

Braintrust

DeepEval (Pytest for LLMs)

Helicone

PromptLayer

Giskard

Honeycomb LLM Observability

Weights & Biases Prompts

Arize Phoenix

Structured Output and Tooling

Instructor Library

Vercel AI SDK (generateObject)

Outlines (by .txt)

LMQL (Language Model Query Language)

Guidance (Microsoft)

Marvin AI

TypeChat

Jsonformer

SGLang

Portkey AI Gateway