Resources

100 Prompt Engineering resources for developers

This resource guide provides developers with the specific patterns, evaluation frameworks, and implementation tools required to move beyond basic chat interfaces and build reliable, production-grade LLM applications. It focuses on systematic prompt engineering, structured output enforcement, and regression testing.

Core Prompting Patterns and Techniques

  1. 1

    Chain-of-Thought (CoT) Reasoning

    beginnerhigh

    Instruct the model to 'think step-by-step' before providing a final answer to improve performance on logic and math tasks.

  2. 2

    Few-Shot In-Context Learning

    beginnerhigh

    Provide 3-5 high-quality examples of input-output pairs within the prompt to define specific formatting and stylistic requirements.

  3. 3

    XML Tagging for Context Separation

    beginnerstandard

    Use tags like <context>, <instruction>, and <examples> to help models like Claude 3.5 Sonnet distinguish between metadata and task data.

  4. 4

    Chain-of-Verification (CoVe)

    advancedmedium

    A multi-step process where the model generates a response, identifies its own potential errors, and verifies facts before final output.

  5. 5

    Skeleton-of-Thought (SoT)

    intermediatemedium

    Prompt the model to generate a high-level outline first, then expand each section to reduce latency and improve structural coherence.

  6. 6

    Negative Constraints and Guardrails

    beginnerstandard

    Explicitly list forbidden words, topics, or formats (e.g., 'Do not use markdown') to prevent unwanted output behavior.

  7. 7

    Persona-Based System Prompts

    beginnerstandard

    Define a specific professional role (e.g., 'You are a Senior Site Reliability Engineer') to prime the model for specific technical jargon and tone.

  8. 8

    Self-Consistency Sampling

    advancedhigh

    Generate multiple outputs for the same prompt and use a majority vote or LLM-as-a-judge to select the most accurate result.

  9. 9

    Dynamic Context Window Management

    intermediatehigh

    Using RAG or sliding windows to pass only the most relevant 20-30% of document context to avoid the 'lost in the middle' phenomenon.

  10. 10

    Output Formatting via Pydantic

    intermediatehigh

    Force the model to return valid JSON by providing a strict schema, often implemented via libraries like Instructor.

Evaluation and Observability Tools

  1. 1

    Promptfoo CLI

    intermediatehigh

    A matrix testing tool that allows you to run prompts against multiple models and check outputs against predefined assertions.

  2. 2

    LangSmith (LangChain)

    intermediatehigh

    A platform for tracing, debugging, and evaluating LLM applications with integrated dataset management for regression testing.

  3. 3

    Braintrust

    advancedhigh

    An enterprise-grade stack for logging LLM calls, managing prompt versions, and running automated evaluation workflows.

  4. 4

    DeepEval (Pytest for LLMs)

    intermediatemedium

    A Python framework for unit testing LLM outputs based on metrics like faithfulness, relevance, and hallucination scores.

  5. 5

    Helicone

    beginnerstandard

    An open-source observability proxy that tracks costs, latency, and token usage for OpenAI and Anthropic requests.

  6. 6

    PromptLayer

    beginnermedium

    A middleware for logging and versioning prompts, allowing developers to roll back to previous prompt iterations without code changes.

  7. 7

    Giskard

    advancedmedium

    An open-source library for detecting vulnerabilities, hallucinations, and biases in LLM-based applications.

  8. 8

    Honeycomb LLM Observability

    advancedstandard

    Distributed tracing for LLMs to understand the latency impact of each step in a complex multi-agent chain.

  9. 9

    Weights & Biases Prompts

    intermediatestandard

    A suite of tools for visualizing and inspecting the execution flow of LLM chains and pipelines.

  10. 10

    Arize Phoenix

    advancedmedium

    Open-source observability for RAG and LLM applications, focusing on embedding visualization and retrieval evaluation.

Structured Output and Tooling

  1. 1

    Instructor Library

    beginnerhigh

    The industry standard for getting structured data (JSON) from LLMs using Pydantic models for Python or Zod for TS.

  2. 2

    Vercel AI SDK (generateObject)

    beginnerhigh

    A unified interface for generating type-safe JSON objects across OpenAI, Anthropic, and Google Gemini models.

  3. 3

    Outlines (by .txt)

    advancedhigh

    A library for neural text generation that uses regex and context-free grammars to guarantee 100% valid JSON or code.

  4. 4

    LMQL (Language Model Query Language)

    advancedmedium

    A programming language that combines logic programming with LLM generation to constrain output at the token level.

  5. 5

    Guidance (Microsoft)

    intermediatemedium

    A template language for controlling LLMs, allowing you to interleave generation, control flow, and fixed strings.

  6. 6

    Marvin AI

    beginnerstandard

    A lightweight library that uses LLMs to power standard Python functions, handling the prompt engineering under the hood.

  7. 7

    TypeChat

    intermediatestandard

    Microsoft's library for replacing complex prompt engineering with schema-based type definitions to guide model output.

  8. 8

    Jsonformer

    intermediatemedium

    A library specifically designed to fill in the values of a JSON schema while the structure is pre-defined, saving tokens.

  9. 9

    SGLang

    advancedhigh

    A fast backend and frontend for structured generation, optimized for high-throughput serving and complex control flow.

  10. 10

    Portkey AI Gateway

    intermediatemedium

    A control plane to manage multiple LLMs with built-in retries, fallbacks, and load balancing across providers.