Building Technical SEO for Web Apps with open-source tools
This guide provides a technical implementation path for optimizing modern web applications for search engines, focusing on Next.js and React-based architectures. It covers the transition from client-side rendering to server-side rendering (SSR) or incremental static regeneration (ISR), ensuring that crawlers receive fully rendered content and structured metadata without execution delays.
Implement Dynamic Metadata via generateMetadata
Replace static meta tags with the Next.js Metadata API to ensure unique titles, descriptions, and canonical URLs for every dynamic route. This prevents duplicate content issues and ensures the crawler receives the correct signals during the initial HTTP request.
import { Metadata } from 'next';
type Props = { params: { id: string } };
export async function generateMetadata({ params }: Props): Promise<Metadata> {
const product = await fetch(`https://api.example.com/products/${params.id}`).then((res) => res.json());
return {
title: product.name,
description: product.description,
alternates: {
canonical: `https://example.com/products/${params.id}`,
},
openGraph: {
images: [product.image],
},
};
}⚠ Common Pitfalls
- •Hardcoding absolute URLs instead of using environment variables for different stages
- •Forgetting to include a canonical tag, leading to URL parameter-based duplication
Inject JSON-LD Structured Data
Embed Schema.org markup directly into the page using a script tag with type 'application/ld+json'. This allows search engines to parse product details, reviews, or organizational info without executing complex JavaScript logic.
export default function ProductPage({ product }) {
const jsonLd = {
'@context': 'https://schema.org',
'@type': 'Product',
name: product.name,
image: product.image,
description: product.description,
offers: {
'@type': 'Offer',
price: product.price,
priceCurrency: 'USD',
availability: 'https://schema.org/InStock',
},
};
return (
<section>
<script
type="application/ld+json"
dangerouslySetInnerHTML={{ __html: JSON.stringify(jsonLd) }}
/>
<h1>{product.name}</h1>
</section>
);
}⚠ Common Pitfalls
- •Inconsistent data between the visible UI and the JSON-LD payload, which can trigger search quality flags
- •Invalid nesting of schema objects causing parsing errors in Google Search Console
Optimize Cumulative Layout Shift (CLS) for Web Vitals
Core Web Vitals are ranking factors. Use the 'next/image' component to ensure images have pre-calculated aspect ratios, preventing layout shifts as images load. Set explicit dimensions or use 'fill' with a defined aspect-ratio container.
import Image from 'next/image';
export default function Hero() {
return (
<div className="aspect-video relative w-full">
<Image
src="/hero.jpg"
alt="Product Hero"
fill
priority
sizes="(max-width: 768px) 100vw, 50vw"
className="object-cover"
/>
</div>
);
}⚠ Common Pitfalls
- •Using 'priority' on every image instead of only those above the fold
- •Neglecting to set a fallback height for dynamic ad slots or third-party widgets
Automate Sitemap Generation for Dynamic Routes
Create a sitemap.ts file to dynamically generate your sitemap.xml. This ensures that new content is discoverable by crawlers immediately after creation without manual updates.
import { MetadataRoute } from 'next';
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const products = await fetch('https://api.example.com/products').then(res => res.json());
const productEntries = products.map((product: any) => ({
url: `https://example.com/products/${product.id}`,
lastModified: new Date(product.updatedAt),
changeFrequency: 'daily',
priority: 0.7,
}));
return [
{
url: 'https://example.com',
lastModified: new Date(),
changeFrequency: 'yearly',
priority: 1,
},
...productEntries,
];
}⚠ Common Pitfalls
- •Exceeding the 50,000 URL limit per sitemap file without implementing a sitemap index
- •Including private, noindex, or broken URLs in the sitemap
Configure robots.txt and Crawl Directives
Explicitly define which paths should be ignored by crawlers to save crawl budget. Use the robots.ts file to manage access to internal search pages, admin panels, or API endpoints.
import { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: {
userAgent: '*',
allow: '/',
disallow: ['/admin/', '/api/', '/search?'],
},
sitemap: 'https://example.com/sitemap.xml',
};
}⚠ Common Pitfalls
- •Accidentally blocking CSS or JS assets required for rendering, leading to partial rendering issues
- •Using robots.txt to try and remove a page from the index (use 'noindex' meta tags instead)
What you built
Following these steps ensures that your web application provides a clean, fast, and structured interface for search engine crawlers. Success should be verified by monitoring the 'Indexing' and 'Experience' reports in Google Search Console to ensure all dynamic pages are correctly discovered and meet the Core Web Vitals thresholds.