Prompt optimization

Improving the performance of your LLM application and agent often requires improving the quality of the prompts you are using.

While you can make changes and review them one by one, we recommend taking a more structured approach.

If you would like to chat more about prompt optimization, feel free to book a chat with an Opik core contributor: Calendar link

Techniques for prompt optimization

Before we cover techniques to improve prompts, it’s worth considering how we will know that a prompt is better than another. For this we recommend using an LLM evaluation framework, by defining a dataset and a set of metrics to evaluate the prompts you will be able to know when you’ve improved a prompt.

Once you have an evaluation framework in place, you can start improving your prompt. There are three main ways to optimize a prompt:

  1. Following prompt best practices
  2. Utilizing response schemas
  3. Automated improvements with Meta-prompts
  4. Advanced techniques like DSPy

Prompt best-practices

Writing prompts is a bit of an art-form and is something that is ever evolving as new models are released. There are however a few best practices that are worth following:

  1. Providing a “persona” or “voice” to the prompt. For example:

You are an expert developer with 10 years of experience in building real-time observability tools.

You are a product manager for an open-source platform and have a deep understanding of what it takes to build LLM products.

  1. Be specific about your task. For example:

Review this React component and identify any performance issues or anti-patterns that could lead to memory leaks.

Analyze this dataset and create a visualization that highlights the correlation between customer age and purchase frequency.

  1. Provide additional context: LLMs only know what they have been trained on, by adding examples of the response you expect you can help the LLM generate the response you want. This is often refered to as “few shot examples”.

Examples: Question: “Opik is an impressive open-source platform built by Comet that revolutionizes LLM application development and evaluation.” Answer: Positive sentiment

For more tips on prompt engineering, the Anthropic guide is very insightful: Anthropic Prompt Engineering. You can also find a prompt engineering guide created by OpenAI here.

Utilize Response Schemas

Response schemas allow you to “force” the model to return a JSON object following a specific format. This makes it much easier to parse the data in downstream tasks without running into parsing errors.

If you are using OpenAI models, you can use the response_format parameter to define a response schema. If you are using other models, we recommend using the Instructor library that provides a nice abstraction layer across multiple models:

1

Install dependencies

$pip install -U instructor openai
2

Using the Instructor library

1import instructor
2from pydantic import BaseModel
3from openai import OpenAI
4
5# Define your desired output structure
6
7class UserInfo(BaseModel):
8name: str
9age: int
10
11# Patch the OpenAI client
12
13client = instructor.from_openai(OpenAI())
14
15# Extract structured data from natural language
16
17user_info = client.chat.completions.create(
18model="gpt-4o-mini",
19response_model=UserInfo,
20messages=[{"role": "user", "content": "John Doe is 30 years old."}],
21)
22
23print(user_info.name)
24print(user_info.age)

Automated improvements with Meta-prompts

Meta-prompt optimization refers to the concept of using a complex LLM prompt to improve the quality of your prompt. This is a technique that is used by both Open AI and Anthropic as part of their Optimize features available in their respective playgrounds.

You can find the full Anthropic meta-prompt available here: Anthropic prompt generator Notebook.

While these meta-prompts can be tedious to get up and running since you have to run the demo notebook, it provides a good way to ensure your prompt follows the best practices for specific model providers.

Automated prompt optimization

If you are looking at going one step further than meta-prompt optimization, you can use frameworks like DSPy that will automatically improve your prompts based on a dataset and a set of metrics.

The main algorithm that powers DSPy is called MIPRO and combines a couple of concepts:

  1. Grounding: When optimizing a prompt or agent, the optimizer should be aware both of what the agent is trying to achieve and what the dataset looks like
  2. Bayesion optimization: Once you have generated some candidate “instructions” to improve the prompt, you can use bayesian optimization techniques to identify which combination of instructions provides the best performance.

To learn more about MIPRO, we recommend reading the paper that proposed a set of really interesting ideas on the topic of prompt and agent optimization.

If you would like to chat more about prompt optimization, feel free to book a chat with an Opik core contributor: Calendar link