Back to Blog
2 min read

Streaming LLM Responses: Patterns for Real-Time User Experience

Users expect immediate feedback when interacting with AI. Streaming responses dramatically improve perceived performance, showing results as they’re generated rather than waiting for complete responses. Here’s how to implement streaming across your full stack.

Server-Side Streaming with Node.js

Configure your API to stream tokens as they arrive:

import { OpenAI } from 'openai';
import { Request, Response } from 'express';

const openai = new OpenAI();

export async function streamCompletion(req: Request, res: Response) {
  const { messages, model = 'gpt-4o' } = req.body;

  // Set headers for SSE
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  try {
    const stream = await openai.chat.completions.create({
      model,
      messages,
      stream: true,
      max_tokens: 2000,
    });

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        res.write(`data: ${JSON.stringify({ content })}\n\n`);
      }

      if (chunk.choices[0]?.finish_reason === 'stop') {
        res.write(`data: ${JSON.stringify({ done: true })}\n\n`);
      }
    }
  } catch (error) {
    res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
  } finally {
    res.end();
  }
}

React Client Implementation

Build a responsive UI that displays streaming content:

import { useState, useCallback } from 'react';

interface StreamState {
  content: string;
  isStreaming: boolean;
  error: string | null;
}

export function useStreamingChat() {
  const [state, setState] = useState<StreamState>({
    content: '',
    isStreaming: false,
    error: null,
  });

  const sendMessage = useCallback(async (messages: Message[]) => {
    setState({ content: '', isStreaming: true, error: null });

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();

      while (reader) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

        for (const line of lines) {
          const data = JSON.parse(line.slice(6));
          if (data.content) {
            setState(prev => ({
              ...prev,
              content: prev.content + data.content,
            }));
          }
          if (data.done) {
            setState(prev => ({ ...prev, isStreaming: false }));
          }
        }
      }
    } catch (error) {
      setState(prev => ({ ...prev, isStreaming: false, error: error.message }));
    }
  }, []);

  return { ...state, sendMessage };
}

Visual Feedback

Add a blinking cursor effect while streaming to indicate ongoing generation. Users perceive streaming responses as 3-5x faster than equivalent non-streaming responses.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.