Guides

Building AI Content Generation with open-source tools

This guide outlines the technical implementation of a programmatic AI content pipeline. It focuses on maintaining brand voice, ensuring factual accuracy through structured validation, and automating the deployment to a headless CMS. The architecture uses a multi-stage approach to overcome the context window limitations and quality degradation common in single-prompt generation.

4-6 hours5 steps
1

Define Structured Output Schemas with Zod

To ensure the AI output is parseable by your application, define a strict schema. This prevents the LLM from adding conversational filler and ensures all required SEO fields (meta description, slug, H1) are present.

schemas/article.ts
import { z } from 'zod';

const ArticleSchema = z.object({
  title: z.string().max(60),
  slug: z.string().regex(/^[a-z0-9-]+$/),
  metaDescription: z.string().max(160),
  sections: z.array(z.object({
    heading: z.string(),
    body: z.string(),
    keywords: z.array(z.string())
  })),
  tags: z.array(z.string())
});

type Article = z.infer<typeof ArticleSchema>;

⚠ Common Pitfalls

  • LLMs may occasionally return markdown blocks inside JSON strings; use a regex to strip ```json wrappers before parsing.
  • Schema complexity increases token usage and latency.
2

Inject Brand Voice and Style Constraints

Create a reusable system prompt that defines the persona, tone, and formatting rules. Instead of generic 'write a blog post' prompts, provide specific 'never-use' word lists and reading level targets.

prompts/config.ts
SYSTEM_PROMPT = `
Persona: Technical Architect for SaaS platforms.
Tone: Direct, authoritative, low-burstiness.
Constraints:
- Avoid superlatives like 'revolutionary' or 'game-changing'.
- Use active voice only.
- Target Flesch-Kincaid Grade 8.
- Format all technical terms in backticks.
- Use Oxford commas.
`;

⚠ Common Pitfalls

  • Over-constraining the model can lead to repetitive sentence structures.
  • Brand voice instructions are often ignored if the user prompt contradicts the system prompt.
3

Implement a Multi-Stage Generation Pipeline

Single-shot generation for long-form content often leads to hallucinations or loss of detail. Break the process into: 1. Research/Outline, 2. Section Drafting, and 3. Final Assembly.

lib/pipeline.ts
async function generateArticle(topic: string) {
  const outline = await callLLM(OUTLINE_PROMPT, topic);
  const sections = [];
  
  for (const section of outline.steps) {
    const content = await callLLM(SECTION_PROMPT, { section, context: outline });
    sections.push(content);
  }
  
  return assemble(sections);
}

⚠ Common Pitfalls

  • Sequential API calls increase the risk of timeout; implement a retry strategy with exponential backoff.
  • Context drift between sections can cause contradictory statements within the same article.
4

Factual Verification via RAG or Search

Integrate a verification step where the generated claims are checked against a trusted knowledge base or a real-time search API (like Perplexity or Tavily) to ensure accuracy.

lib/verify.ts
async function verifyClaims(content: string) {
  const claims = await extractClaims(content);
  const results = await Promise.all(claims.map(c => searchAPI.search(c)));
  
  if (results.some(r => r.confidence < 0.8)) {
    throw new Error('Factual inconsistency detected');
  }
}

⚠ Common Pitfalls

  • Automated verification can return false positives if the search query is poorly constructed.
  • API costs for search can scale rapidly in high-volume pipelines.
5

Automated Deployment to Headless CMS

Programmatically push the validated and formatted content to your CMS. Use the CMS's management SDK to handle assets, environment tagging, and initial draft status for human review.

lib/cms.ts
import { createClient } from '@contentful/management';

async function publishToCMS(article: Article) {
  const client = createClient({ accessToken: process.env.CMS_TOKEN });
  const space = await client.getSpace(process.env.SPACE_ID);
  const env = await space.getEnvironment('master');

  await env.createEntry('blogPost', {
    fields: {
      title: { 'en-US': article.title },
      content: { 'en-US': article.sections.map(s => s.body).join('\n\n') },
      slug: { 'en-US': article.slug }
    }
  });
}

⚠ Common Pitfalls

  • Rate limits on CMS Management APIs are often much tighter than Delivery APIs.
  • Failure to map AI-generated categories to existing CMS taxonomy will break site navigation.

What you built

By moving from single-shot prompts to a structured, multi-stage pipeline with Zod validation and factual checking, you create a production-ready content engine. This approach minimizes hallucinations and ensures that the output is ready for a human editor to perform a final review rather than a complete rewrite.