Skip to content
Back to Blog
1 min read

Streaming LLM Responses: Patterns for Real-Time User Experience

I wrote “Streaming LLM Responses: Patterns for Real-Time User Experience” to share practical, production-minded guidance on this topic.

Server-Side Streaming with Node.js

Configure your API to stream tokens as they arrive:

import { OpenAI } from 'openai';
import { Request, Response } from 'express';

const openai = new OpenAI();

export async function streamCompletion(req: Request, res: Response) {
  const { messages, model = 'gpt-4o' } = req.body;

  // Set headers for SSE
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  try {
    const stream = await openai.chat.completions.create({
      model,
      messages,
      stream: true,
      max_tokens: 2000,
    });

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        res.write(`data: ${JSON.stringify({ content })}\n\n`);
      }

      if (chunk.choices[0]?.finish_reason === 'stop') {
        res.write(`data: ${JSON.stringify({ done: true })}\n\n`);
      }
    }
  } catch (error) {
    res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
  } finally {
    res.end();
  }
}

React Client Implementation

Build a responsive UI that displays streaming content:

import { useState, useCallback } from 'react';

interface StreamState {
  content: string;
  isStreaming: boolean;
  error: string | null;
}

export function useStreamingChat() {
  const [state, setState] = useState<StreamState>({
    content: '',
    isStreaming: false,
    error: null,
  });

  const sendMessage = useCallback(async (messages: Message[]) => {
    setState({ content: '', isStreaming: true, error: null });

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();

      while (reader) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

        for (const line of lines) {
          const data = JSON.parse(line.slice(6));
          if (data.content) {
            setState(prev => ({
              ...prev,
              content: prev.content + data.content,
            }));
          }
          if (data.done) {
            setState(prev => ({ ...prev, isStreaming: false }));
          }
        }
      }
    } catch (error) {
      setState(prev => ({ ...prev, isStreaming: false, error: error.message }));
    }
  }, []);

  return { ...state, sendMessage };
}

Visual Feedback

Add a blinking cursor effect while streaming to indicate ongoing generation. Users perceive streaming responses as 3-5x faster than equivalent non-streaming responses.\n\n## Takeaways\n\nAdd a concise, personal takeaway and recommended next steps here.\n

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.