Guides

Building Server-sent events for LLM streaming with Vercel...

This guide provides a structured approach to implementing streaming LLM responses in real-time applications. Focuses on server setup, client integration, and error resilience across frameworks.

1-2 hours6 steps

Set up server-side streaming endpoint

Create a route handler that configures CORS and content-type headers for streaming. Use Next.js Route Handlers or equivalent framework features to maintain open connections.

app/api/stream/route.ts

export default async function GET(req: Request) {
  const response = new Response();
  response.headers.set('Content-Type', 'text/event-stream');
  response.headers.set('Cache-Control', 'no-cache');
  return response;
}

Integrate LLM streaming API

Configure the LLM client to use server-sent events or chunked transfer encoding. Ensure the API request includes stream=true parameter and proper authorization headers.

const response = await fetch('https://api.example.com/stream', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ stream: true })
});

Process incoming stream on client

Use ReadableStream API or EventSource to consume the response. Implement a parser that handles line-by-line decoding and JSON parsing for structured outputs.

const reader = response.body.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = new TextDecoder().decode(value);
  // Process stream chunks
}

Handle partial JSON responses

Implement buffer management to accumulate incomplete JSON fragments. Use a parser that can detect complete objects and process them incrementally.

let buffer = '';
function processChunk(chunk) {
  buffer += chunk;
  let json;
  try {
    json = JSON.parse(buffer);
    buffer = '';
  } catch (e) {
    // Incomplete JSON, keep buffer
  }
  return json;
}

⚠ Common Pitfalls

•Ignoring malformed JSON fragments causing stream termination
•Not resetting buffer after successful parsing

Build incremental UI rendering

Implement a rendering strategy that updates the UI with each token. Use React state or virtual DOM diffing to minimize re-renders.

const [content, setContent] = useState('');
useEffect(() => {
  const handler = (token) => setContent(prev => prev + token);
  // Attach to stream processor
}, []);

Implement error recovery logic

Add retry mechanisms for stream interruptions. Handle 503 errors with exponential backoff and implement reconnection strategies for long-lived connections.

function retryStream(attempt = 1) {
  setTimeout(() => {
    // Reinitialize stream with backoff
  }, Math.min(1000 * Math.pow(2, attempt), 30000));
}

⚠ Common Pitfalls

•Not handling 503 errors leading to permanent stream failure
•Inadequate backoff strategy causing resource exhaustion

What you built

By following these steps, developers can implement reliable streaming workflows that balance real-time performance with error resilience. Focus on incremental parsing and UI updates to maintain perceived responsiveness.