Guides

Building Server-sent events for LLM streaming with Vercel...

This guide provides a structured approach to implementing streaming LLM responses in real-time applications. Focuses on server setup, client integration, and error resilience across frameworks.

1-2 hours6 steps
1

Set up server-side streaming endpoint

Create a route handler that configures CORS and content-type headers for streaming. Use Next.js Route Handlers or equivalent framework features to maintain open connections.

app/api/stream/route.ts
export default async function GET(req: Request) {
  const response = new Response();
  response.headers.set('Content-Type', 'text/event-stream');
  response.headers.set('Cache-Control', 'no-cache');
  return response;
}
2

Integrate LLM streaming API

Configure the LLM client to use server-sent events or chunked transfer encoding. Ensure the API request includes stream=true parameter and proper authorization headers.

const response = await fetch('https://api.example.com/stream', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ stream: true })
});
3

Process incoming stream on client

Use ReadableStream API or EventSource to consume the response. Implement a parser that handles line-by-line decoding and JSON parsing for structured outputs.

const reader = response.body.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = new TextDecoder().decode(value);
  // Process stream chunks
}
4

Handle partial JSON responses

Implement buffer management to accumulate incomplete JSON fragments. Use a parser that can detect complete objects and process them incrementally.

let buffer = '';
function processChunk(chunk) {
  buffer += chunk;
  let json;
  try {
    json = JSON.parse(buffer);
    buffer = '';
  } catch (e) {
    // Incomplete JSON, keep buffer
  }
  return json;
}

⚠ Common Pitfalls

  • Ignoring malformed JSON fragments causing stream termination
  • Not resetting buffer after successful parsing
5

Build incremental UI rendering

Implement a rendering strategy that updates the UI with each token. Use React state or virtual DOM diffing to minimize re-renders.

const [content, setContent] = useState('');
useEffect(() => {
  const handler = (token) => setContent(prev => prev + token);
  // Attach to stream processor
}, []);
6

Implement error recovery logic

Add retry mechanisms for stream interruptions. Handle 503 errors with exponential backoff and implement reconnection strategies for long-lived connections.

function retryStream(attempt = 1) {
  setTimeout(() => {
    // Reinitialize stream with backoff
  }, Math.min(1000 * Math.pow(2, attempt), 30000));
}

⚠ Common Pitfalls

  • Not handling 503 errors leading to permanent stream failure
  • Inadequate backoff strategy causing resource exhaustion

What you built

By following these steps, developers can implement reliable streaming workflows that balance real-time performance with error resilience. Focus on incremental parsing and UI updates to maintain perceived responsiveness.