Within the drive to stay aggressive, companies as we speak are turning to AI to assist them reduce price and maximize effectivity. It’s incumbent on them to seek out essentially the most appropriate AI mannequin—the one that may assist them obtain extra whereas spending much less. For a lot of companies, the migration from OpenAI’s mannequin household to Amazon Nova represents not solely a shift in mannequin however a strategic transfer towards scalability, effectivity, and broader multimodal capabilities.
On this weblog, we focus on how one can optimize prompting in Amazon Nova for the very best price-performance.
Why migrate from OpenAI to Amazon Nova?
OpenAI’s fashions stay highly effective, however their operational prices will be prohibitive when scaled. Think about these figures from Artificial Analysis:
Mannequin | Enter Token Price (per Million Tokens) | Output Token Price (per Million Tokens) | Context Window | Output Pace (Tokens per Second) | Latency (Seconds per first token) |
GPT-4o | ~$2.50 | ~$10.00 | As much as 128K tokens | ~63 | ~0.49 |
GPT-4o Mini | ~$0.15 | ~$0.60 | As much as 128K tokens | ~90 | ~0.43 |
Nova Micro | ~$0.035 | ~$0.14 | As much as 128K tokens | ~195 | ~0.29 |
Nova Lite | ~$0.06 | ~$0.24 | As much as 300K tokens | ~146 | ~0.29 |
Nova Professional | ~$0.80 | ~$3.20 | As much as 300K tokens | ~90 | ~0.34 |
For prime-volume functions—like international buyer help or large-scale doc evaluation—these price variations are disruptive. Not solely does Amazon Nova Professional provide over 3 times the cost-efficiency, its longer context window additionally permits it to deal with extra intensive and sophisticated inputs.
Breaking down the Amazon Nova suite
Amazon Nova isn’t a single mannequin—it’s a set designed for numerous wants:
- Amazon Nova Professional – A strong multimodal mannequin that may course of textual content, photos, and video. It excels at duties resembling doc evaluation and deep knowledge visualization. Benchmark comparisons present Amazon Nova Professional matching and even surpassing GPT-4o on advanced reasoning duties, according to section 2.1.1 of the Nova technical report and model card.
- Amazon Nova Lite – Presents a balanced mixture of multimodal processing and pace. Amazon Nova Lite is right for functions resembling doc summarization, translation, and even primary visible search, delivering high quality outputs at decrease latency and value in comparison with GPT-4o Mini. Yow will discover these benchmark ends in section 2.1.2 of the Nova Technical Report and Model Card.
- Amazon Nova Micro – A text-only mannequin engineered for ultra-low latency. With output pace of as much as 195 tokens per second, Amazon Nova Micro is ideal for real-time functions resembling chat-based assistants and automatic FAQs. Its token prices are dramatically decrease than these of GPT-4o Mini—roughly 4.3 instances cheaper on a per-token foundation.
The decrease per-token prices and better output per second of Amazon Nova provide the flexibility to simplify prompts for real-time functions so you’ll be able to stability high quality, pace, and value in your use case.
Understanding the foundations
To make the very best choice about which mannequin household matches your wants, it’s vital to grasp the variations in immediate engineering greatest practices in each OpenAI and Amazon Nova. Every mannequin household has its personal set of strengths, however there are some issues that apply to each households. Throughout each mannequin households, high quality accuracy is achieved by means of readability of directions, structured prompts, and iterative refinement. Whether or not you’re utilizing sturdy output directives or clearly outlined use instances, the aim is to cut back ambiguity and enhance response high quality.
The OpenAI strategy
OpenAI makes use of a layered messaging system for immediate engineering, the place system, developer, and person prompts work in concord to manage tone, security, and output format. Their strategy emphasizes:
- Hierarchical message roles – Setting the mannequin’s function and conduct utilizing system messages makes certain that the overarching security and magnificence tips (set in system prompts) are preserved
- Instruction placement and delimiters – Directives are positioned at first, with clear separation between context, examples, and queries
- Selective chain-of-thought – Detailed, step-by-step reasoning is used when it advantages advanced duties
- Formatting and construction – Utilizing sturdy directives resembling DO, MUST, and DO NOT to offer constant outputs (for instance, in JSON)
The Amazon Nova strategy
- Outline the immediate use case
- Job – What precisely the mannequin ought to do
- Position – Which function the mannequin ought to assume
- Response type – The construction or tone of the output
- Directions – Tips the mannequin should comply with
- Chain-of-thought
- Express state response – Present clear and powerful directions to restrict the mannequin’s response
- Structural pondering – Considering step-by-step encourages structural pondering
- Formatting and construction
- Use delimiters to sectioned your prompts, for instance, ##Job##, ##Context##, or ##Instance##
- Specify the output format, for instance, JSON, YAML, or Markdown
- Use sturdy directions and caps, resembling DO, DO NOT, or MUST
- Prefill the responses to information the mannequin, for instance, begin with “{“ or “json…”
Evolving your immediate technique: Migrating from OpenAI to Amazon Nova
Transitioning to Amazon Nova isn’t merely a change in API endpoints—it requires retooling your immediate engineering to align with the strengths of Amazon Nova. You’ll want to reframe your use case definition. Start by breaking down your present GPT-4o or GPT-4o Mini immediate into its core components of process, function, response type, and directions and success standards. Be sure to construction these components clearly to offer a blueprint for the mannequin.
To grasp how one can migrate an current OpenAI immediate to work optimally for Amazon Nova Professional, take into account the next instance utilizing the assembly notes summarizer. Right here is the GPT-4o system immediate:
The person immediate is the assembly notes that must be summarized:
GPT produces this beneficial response:
To satisfy or exceed the standard of the response from GPT-4o, here’s what an Amazon Nova Professional immediate may appear to be. The immediate makes use of the identical greatest practices mentioned on this publish, beginning with the system immediate. We used a temperature of .2 and a topP of .9 right here:
Right here’s the person immediate, utilizing prefilled responses:
The next instance reveals that the Amazon Nova response meets and exceeds the accuracy of the OpenAI instance, codecs the output in Markdown, and has discovered clear homeowners for every motion merchandise:
A number of updates to the immediate can obtain comparable or higher outcomes from Amazon Nova Professional whereas having fun with a a lot inexpensive price of inference.
Make use of Amazon Nova prolonged context
Amazon Nova Lite and Amazon Nova Professional can help as much as 300,000 enter tokens, which suggests which you can embody extra context in your immediate if wanted. Develop your background knowledge and detailed directions accordingly—in case your authentic OpenAI immediate was optimized for 128,000 tokens, regulate it to make use of the Amazon Nova prolonged window.
Tailor output constraints
In case your GPT immediate required strict formatting (for instance, “Reply in JSON solely”), make it possible for your Amazon Nova immediate contains these directives. Moreover, in case your process entails multimodal inputs, specify when to incorporate photos or video references.
Perform calling
The rise of generative AI brokers has made perform calling, or instrument calling, some of the vital talents of a given large language model (LLM). A mannequin’s capacity to accurately choose the best instrument for the job, in a low-latency method, is commonly the distinction between success and failure of an agentic system.
Each OpenAI and Amazon Nova fashions share similarities in perform calling, specifically their help for structured API calls. Each mannequin households help instrument choice by means of outlined instrument schemas, which we focus on later on this publish. In addition they each present a mechanism to determine when to invoke these instruments or not.
OpenAI’s perform calling makes use of versatile JSON schemas to outline and construction API interactions. The fashions help a variety of schema configurations, which give builders the flexibility to shortly implement exterior perform calls by means of easy JSON definitions tied to their API endpoints.
Right here is an instance of a perform:
instruments = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Montevideo, Uruguay"
}
},
"required": [
"location"
],
"additionalProperties": False
},
"strict": True
}
}]
completion = consumer.chat.completions.create(
mannequin="gpt-4o",
messages=[{"role": "user", "content": "What is the weather like in Punta del Este today?"}],
instruments=instruments
Much like OpenAI’s strategy, Amazon Nova can name instruments when handed a configuration schema as proven within the following code instance. Amazon Nova has made heavy use of Greedy Decoding when calling instruments, and it’s suggested to set temperature, topP, and topK to 1. This makes certain that the mannequin has the best accuracy in instrument choice. These Grasping Decoding parameters and different nice examples of instrument use are lined in nice element in Tool use (function calling) with Amazon Nova.
The next is an instance of perform calling with out utilizing additionalModelRequestFields:
tool_config = {
"instruments": [{
"toolSpec": {
"name": "get_recipe",
"description": "Structured recipe generation system",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"recipe": {
"type": "object",
"properties": {
"name": {"type": "string"},
"ingredients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"item": {"type": "string"},
"amount": {"type": "number"},
"unit": {"type": "string"}
}
}
},
"instructions": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name", "ingredients", "instructions"]
}
}
}
}
}
}]
}
# Base configuration with out topK=1
input_text = "I want a recipe for chocolate lava cake"
messages = [{
"role": "user",
"content": [{"text": input_text}]
}]
# Inference parameters
inf_params = {"topP": 1, "temperature": 1}
response = consumer.converse(
modelId="us.amazon.nova-lite-v1:0",
messages=messages,
toolConfig=tool_config,
inferenceConfig=inf_params
)
# Sometimes produces much less structured or incomplete output
The next instance reveals how perform calling accuracy will be improved through the use of
additionalModelRequestFields:
# Enhanced configuration with topK=1
response = consumer.converse(
modelId="us.amazon.nova-lite-v1:0",
messages=messages,
toolConfig=tool_config,
inferenceConfig=inf_params,
additionalModelRequestFields={"inferenceConfig": {"topK": 1}}
)
# Produces extra correct and structured perform name
To maximise Amazon Nova perform calling potential and enhance accuracy, all the time use additionalModelRequestFields with topk=1. This forces the mannequin to pick out the one most possible token and prevents random token choice. This will increase deterministic output technology and improves perform name precision by about 30–40%.
The next code examples additional clarify how one can conduct instrument calling efficiently. The primary state of affairs reveals recipe technology with out an specific instrument. The instance doesn’t use topK, which generally ends in responses which might be much less structured:
input_text = """
I am on the lookout for a decadent chocolate dessert that is fast to arrange.
One thing that appears fancy however is not sophisticated to make.
"""
messages = [{
"role": "user",
"content": [{"text": input_text}]
}]
response = consumer.converse(
modelId="us.amazon.nova-lite-v1:0",
messages=messages,
inferenceConfig={"topP": 1, "temperature": 1}
)
# Generates a conversational recipe description
# Much less structured, extra narrative-driven response
On this instance, the state of affairs reveals recipe technology with a structured instrument. We add topK set to 1, which produces a extra structured output:
response = consumer.converse(
modelId="us.amazon.nova-lite-v1:0",
messages=messages,
toolConfig=tool_config,
inferenceConfig={"topP": 1, "temperature": 1},
additionalModelRequestFields={"inferenceConfig": {"topK": 1}}
)
# Generates a extremely structured, JSON-compliant recipe
# Consists of exact ingredient measurements
# Supplies step-by-step directions
Total, OpenAI provides extra versatile, broader schema help. Amazon Nova supplies extra exact, managed output technology and is the only option when working with high-stakes, structured knowledge situations, as demonstrated in Amazon Nova’s efficiency on the IFEval benchmark mentioned in part 2.1.1 of the technical report and mannequin card. We advocate utilizing Amazon Nova for functions requiring predictable, structured responses as a result of its perform calling methodology supplies superior management and accuracy.
Conclusion
The evolution from OpenAI’s fashions to Amazon Nova represents a big shift in utilizing AI. It reveals a transition towards fashions that ship comparable or superior efficiency at a fraction of the price, with expanded capabilities in multimodal processing and prolonged context dealing with.
Whether or not you’re utilizing the sturdy, enterprise-ready Amazon Nova Professional, the agile and economical Amazon Nova Lite, or the versatile Amazon Nova Micro, the advantages are clear:
- Price financial savings – With token prices as much as 4 instances decrease, companies can scale functions extra economically
- Enhanced response efficiency – Quicker response instances (as much as 190 tokens per second) make real-time functions extra viable
- Expanded capabilities – A bigger context window and multimodal help unlock new functions, from detailed doc evaluation to built-in visible content material
By evolving your immediate technique—redefining use instances, exploiting the prolonged context, and iteratively refining directions—you’ll be able to easily migrate your current workflows from OpenAI’s o4 and o4-mini fashions to the progressive world of Amazon Nova.
Concerning the Authors
Pat Reilly is a Sr. Specialist Options Architect on the Amazon Bedrock Go-to-Market workforce. Pat has spent the final 15 years in analytics and machine studying as a marketing consultant. When he’s not constructing on AWS, you will discover him fumbling round with wooden tasks.