September 8, 2025 1 min read

Fine-Tuning GPT Models: When and How to Customize

Fine-Tuning GPT Azure OpenAI LLM Customization

Fine-tuning adapts foundation models to your specific domain and style. While powerful, it is not always the right choice. Understanding when to fine-tune versus when to use prompting is crucial for efficient AI development.

When to Fine-Tune

Fine-tuning makes sense when you need consistent formatting, domain-specific terminology, or significant behavior changes that prompting cannot achieve reliably.

Preparing Training Data

import json
from typing import List, Dict
from dataclasses import dataclass
import tiktoken

@dataclass
class TrainingExample:
    system: str
    user: str
    assistant: str

def prepare_training_file(examples: List[TrainingExample],
                          output_path: str) -> Dict[str, int]:
    """Prepare JSONL file for fine-tuning with validation."""

    encoding = tiktoken.encoding_for_model("gpt-4o")
    stats = {"total_examples": 0, "total_tokens": 0, "errors": []}

    with open(output_path, 'w') as f:
        for i, example in enumerate(examples):
            # Validate example
            if not example.user or not example.assistant:
                stats["errors"].append(f"Example {i}: Missing required fields")
                continue

            message = {
                "messages": [
                    {"role": "system", "content": example.system},
                    {"role": "user", "content": example.user},
                    {"role": "assistant", "content": example.assistant}
                ]
            }

            # Count tokens
            text = json.dumps(message)
            tokens = len(encoding.encode(text))

            if tokens > 16000:
                stats["errors"].append(f"Example {i}: Exceeds token limit ({tokens})")
                continue

            f.write(json.dumps(message) + '\n')
            stats["total_examples"] += 1
            stats["total_tokens"] += tokens

    return stats

# Example usage
examples = [
    TrainingExample(
        system="You are a legal document analyst. Extract key terms precisely.",
        user="Analyze this contract clause: 'The indemnifying party shall...'",
        assistant="**Key Terms Extracted:**\n- Indemnifying Party: [Party A]\n- Obligation: Hold harmless..."
    ),
    # Add hundreds more examples...
]

stats = prepare_training_file(examples, "training_data.jsonl")
print(f"Prepared {stats['total_examples']} examples, {stats['total_tokens']} total tokens")

Launching Fine-Tuning in Azure

from openai import AzureOpenAI
import time

client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_KEY"],
    api_version="2024-06-01",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
)

# Upload training file
with open("training_data.jsonl", "rb") as f:
    training_file = client.files.create(file=f, purpose="fine-tune")

# Create fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="gpt-4o-mini",
    hyperparameters={
        "n_epochs": 3,
        "learning_rate_multiplier": 0.1
    }
)

print(f"Fine-tuning job created: {job.id}")

Evaluation is Critical

Always hold out a test set. Compare fine-tuned model performance against the base model with good prompts. Sometimes prompt engineering achieves 90% of the benefit at 10% of the cost.

Fine-tuning is a powerful tool, but use it judiciously. Start with prompting, and only fine-tune when you have clear evidence it will provide meaningful improvement.