Building Server-sent events for LLM streaming with Vercel...
This guide provides a structured approach to implementing streaming LLM responses in real-time applications. Focuses on server setup, client integration, and error resilience across frameworks.
Set up server-side streaming endpoint
Create a route handler that configures CORS and content-type headers for streaming. Use Next.js Route Handlers or equivalent framework features to maintain open connections.
export default async function GET(req: Request) {
const response = new Response();
response.headers.set('Content-Type', 'text/event-stream');
response.headers.set('Cache-Control', 'no-cache');
return response;
}Integrate LLM streaming API
Configure the LLM client to use server-sent events or chunked transfer encoding. Ensure the API request includes stream=true parameter and proper authorization headers.
const response = await fetch('https://api.example.com/stream', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({ stream: true })
});Process incoming stream on client
Use ReadableStream API or EventSource to consume the response. Implement a parser that handles line-by-line decoding and JSON parsing for structured outputs.
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = new TextDecoder().decode(value);
// Process stream chunks
}Handle partial JSON responses
Implement buffer management to accumulate incomplete JSON fragments. Use a parser that can detect complete objects and process them incrementally.
let buffer = '';
function processChunk(chunk) {
buffer += chunk;
let json;
try {
json = JSON.parse(buffer);
buffer = '';
} catch (e) {
// Incomplete JSON, keep buffer
}
return json;
}⚠ Common Pitfalls
- •Ignoring malformed JSON fragments causing stream termination
- •Not resetting buffer after successful parsing
Build incremental UI rendering
Implement a rendering strategy that updates the UI with each token. Use React state or virtual DOM diffing to minimize re-renders.
const [content, setContent] = useState('');
useEffect(() => {
const handler = (token) => setContent(prev => prev + token);
// Attach to stream processor
}, []);Implement error recovery logic
Add retry mechanisms for stream interruptions. Handle 503 errors with exponential backoff and implement reconnection strategies for long-lived connections.
function retryStream(attempt = 1) {
setTimeout(() => {
// Reinitialize stream with backoff
}, Math.min(1000 * Math.pow(2, attempt), 30000));
}⚠ Common Pitfalls
- •Not handling 503 errors leading to permanent stream failure
- •Inadequate backoff strategy causing resource exhaustion
What you built
By following these steps, developers can implement reliable streaming workflows that balance real-time performance with error resilience. Focus on incremental parsing and UI updates to maintain perceived responsiveness.