Building Content Platforms & CMS with open-source tools
Transitioning from monolithic legacy CMS platforms to a headless architecture requires moving from page-centric layouts to structured, reusable content models. This guide outlines the implementation of a migration pipeline and the setup of a type-safe content infrastructure using modern headless patterns.
Design the Normalized Content Schema
Deconstruct existing pages into atomic content types. Identify shared attributes across 'Post', 'Page', and 'Case Study'. Use reference fields for authors and categories rather than duplicating data. Define validation rules (e.g., regex for slugs, character limits for SEO titles) at the schema level to ensure data integrity.
⚠ Common Pitfalls
- •Creating deeply nested objects that exceed GraphQL query depth limits
- •Using generic rich-text fields where structured components like 'Call to Action' or 'Product Grid' are required for design consistency
Develop a Scripted Transformation Pipeline
Legacy content often arrives as raw HTML strings. Use a parser like 'rehype' to traverse the DOM and map HTML tags to the target CMS's structured block format (e.g., Portable Text). This preserves semantic meaning while stripping inline styles and non-standard attributes.
const { unified } = require('unified');
const parse = require('rehype-parse');
async function convertToBlocks(html) {
const tree = unified().use(parse, { fragment: true }).parse(html);
// Map tree nodes to CMS-specific block objects
return tree.children.map(node => ({
_type: 'block',
children: [{ _type: 'span', text: node.value || '' }]
}));
}⚠ Common Pitfalls
- •Failing to handle <iframe> or <script> tags, which can break the frontend rendering
- •Losing internal cross-links during the transformation process if not correctly mapped to new document IDs
Programmatic Asset Migration and Linking
Upload legacy images and media programmatically using the CMS Management API. Capture original filenames and ALT text from the source. Ensure that the new asset IDs are correctly referenced within the newly created content blocks to maintain the relationship between text and media.
⚠ Common Pitfalls
- •Exceeding API rate limits during bulk asset uploads
- •Uploading duplicate assets instead of checking for existing MD5 hashes to optimize storage
Execute Post-Migration Reference Resolution
After importing core entities, run a second pass to resolve relationships. Map legacy IDs to the new CMS document IDs. For example, if a legacy Post referenced an Author ID '123', update the Post in the headless CMS to point to the new internal Author document reference.
⚠ Common Pitfalls
- •Deadlocks when two documents reference each other simultaneously during creation
- •Dangling references where the source content points to a deleted or non-existent entity
Configure Webhooks for Incremental Static Regeneration
Set up webhooks in the CMS to notify your frontend when content changes. Implement a route handler in Next.js or Astro that listens for these events and triggers a cache purge or revalidation for the specific modified paths, ensuring production content stays fresh without full site rebuilds.
import { revalidatePath } from 'next/cache';
export async function POST(req: Request) {
const body = await req.json();
const slug = body?.slug?.current;
if (!slug) return new Response('Missing slug', { status: 400 });
// Revalidate the specific blog post and the index page
revalidatePath(`/blog/${slug}`);
revalidatePath('/blog');
return Response.json({ revalidated: true, now: Date.now() });
}⚠ Common Pitfalls
- •Security vulnerabilities from unauthenticated webhook endpoints; always use a secret token in the header
- •Triggering excessive rebuilds for minor draft updates if the webhook is not filtered for 'published' status
What you built
A successful CMS migration is a data-engineering task that prioritizes structure over presentation. By automating the transformation of legacy HTML into structured blocks and resolving references programmatically, you create a scalable foundation for multi-channel content delivery.