Saturday, April 26, 2025

Prompting for the very best price-performance

Share


Within the drive to stay aggressive, companies as we speak are turning to AI to assist them reduce price and maximize effectivity. It’s incumbent on them to seek out essentially the most appropriate AI mannequin—the one that may assist them obtain extra whereas spending much less. For a lot of companies, the migration from OpenAI’s mannequin household to Amazon Nova represents not solely a shift in mannequin however a strategic transfer towards scalability, effectivity, and broader multimodal capabilities.

On this weblog, we focus on how one can optimize prompting in Amazon Nova for the very best price-performance.

Why migrate from OpenAI to Amazon Nova?

OpenAI’s fashions stay highly effective, however their operational prices will be prohibitive when scaled. Think about these figures from Artificial Analysis:

Mannequin Enter Token Price (per Million Tokens) Output Token Price (per Million Tokens) Context Window Output Pace (Tokens per Second) Latency (Seconds per first token)
GPT-4o ~$2.50 ~$10.00 As much as 128K tokens ~63 ~0.49
GPT-4o Mini ~$0.15 ~$0.60 As much as 128K tokens ~90 ~0.43
Nova Micro ~$0.035 ~$0.14 As much as 128K tokens ~195 ~0.29
Nova Lite ~$0.06 ~$0.24 As much as 300K tokens ~146 ~0.29
Nova Professional ~$0.80 ~$3.20 As much as 300K tokens ~90 ~0.34

For prime-volume functions—like international buyer help or large-scale doc evaluation—these price variations are disruptive. Not solely does Amazon Nova Professional provide over 3 times the cost-efficiency, its longer context window additionally permits it to deal with extra intensive and sophisticated inputs.

Breaking down the Amazon Nova suite

Amazon Nova isn’t a single mannequin—it’s a set designed for numerous wants:

  • Amazon Nova Professional – A strong multimodal mannequin that may course of textual content, photos, and video. It excels at duties resembling doc evaluation and deep knowledge visualization. Benchmark comparisons present Amazon Nova Professional matching and even surpassing GPT-4o on advanced reasoning duties, according to section 2.1.1 of the Nova technical report and model card.
  • Amazon Nova Lite – Presents a balanced mixture of multimodal processing and pace. Amazon Nova Lite is right for functions resembling doc summarization, translation, and even primary visible search, delivering high quality outputs at decrease latency and value in comparison with GPT-4o Mini. Yow will discover these benchmark ends in section 2.1.2 of the Nova Technical Report and Model Card.
  • Amazon Nova Micro – A text-only mannequin engineered for ultra-low latency. With output pace of as much as 195 tokens per second, Amazon Nova Micro is ideal for real-time functions resembling chat-based assistants and automatic FAQs. Its token prices are dramatically decrease than these of GPT-4o Mini—roughly 4.3 instances cheaper on a per-token foundation.

The decrease per-token prices and better output per second of Amazon Nova provide the flexibility to simplify prompts for real-time functions so you’ll be able to stability high quality, pace, and value in your use case.

Understanding the foundations

To make the very best choice about which mannequin household matches your wants, it’s vital to grasp the variations in immediate engineering greatest practices in each OpenAI and Amazon Nova. Every mannequin household has its personal set of strengths, however there are some issues that apply to each households. Throughout each mannequin households, high quality accuracy is achieved by means of readability of directions, structured prompts, and iterative refinement. Whether or not you’re utilizing sturdy output directives or clearly outlined use instances, the aim is to cut back ambiguity and enhance response high quality.

The OpenAI strategy

OpenAI makes use of a layered messaging system for immediate engineering, the place system, developer, and person prompts work in concord to manage tone, security, and output format. Their strategy emphasizes:

  • Hierarchical message roles – Setting the mannequin’s function and conduct utilizing system messages makes certain that the overarching security and magnificence tips (set in system prompts) are preserved
  • Instruction placement and delimiters – Directives are positioned at first, with clear separation between context, examples, and queries
  • Selective chain-of-thought – Detailed, step-by-step reasoning is used when it advantages advanced duties
  • Formatting and construction – Utilizing sturdy directives resembling DO, MUST, and DO NOT to offer constant outputs (for instance, in JSON)

The Amazon Nova strategy

  • Outline the immediate use case
    • Job – What precisely the mannequin ought to do
    • Position – Which function the mannequin ought to assume
    • Response type – The construction or tone of the output
    • Directions – Tips the mannequin should comply with
  • Chain-of-thought
    • Express state response – Present clear and powerful directions to restrict the mannequin’s response
    • Structural pondering – Considering step-by-step encourages structural pondering
  • Formatting and construction
    • Use delimiters to sectioned your prompts, for instance, ##Job##, ##Context##, or ##Instance##
    • Specify the output format, for instance, JSON, YAML, or Markdown
    • Use sturdy directions and caps, resembling DO, DO NOT, or MUST
    • Prefill the responses to information the mannequin, for instance, begin with “{“ or “json…”

Evolving your immediate technique: Migrating from OpenAI to Amazon Nova

Transitioning to Amazon Nova isn’t merely a change in API endpoints—it requires retooling your immediate engineering to align with the strengths of Amazon Nova. You’ll want to reframe your use case definition. Start by breaking down your present GPT-4o or GPT-4o Mini immediate into its core components of process, function, response type, and directions and success standards. Be sure to construction these components clearly to offer a blueprint for the mannequin.

To grasp how one can migrate an current OpenAI immediate to work optimally for Amazon Nova Professional, take into account the next instance utilizing the assembly notes summarizer. Right here is the GPT-4o system immediate:

You can be supplied with assembly notes, and your process is to summarize the assembly 
as follows: 
-Total abstract of debate 
-Motion gadgets (what must be completed and who's doing it) 
-If relevant, an inventory of subjects that must be mentioned extra absolutely within the subsequent assembly.

The person immediate is the assembly notes that must be summarized:

Assembly Date: March fifth, 2050
Assembly Time: 2:00 PM
Location: Convention Room 3B, Intergalactic Headquarters...

GPT produces this beneficial response:

Total Abstract of Dialogue: The workforce mentioned the current mission to Planet Zog, 
the house pirate situation in Sector 7, the annual Intergalactic Bake-Off, planning for 
the upcoming charity fundraiser, and a proposed team-building retreat. The mission to 
Planet Zog was profitable, however communication with the Zogians was difficult. The workforce 
additionally mentioned the necessity for a greater technique to take care of house pirates. The workforce positioned 
second within the Intergalactic Bake-Off and is planning for the upcoming charity fundraiser. 
A team-building retreat to the Moon Resort and Spa was proposed.

Motion Gadgets:
1. Dr. Quasar to begin engaged on a Zogian-English dictionary.
2. Captain Stardust to talk with Admiral Starbeam about growing patrols in Sector 7.
3. Dr. Quasar to finalize the prototype of the brand new cloaking know-how.
4. All workforce members to begin gathering provides and making ready video games for the Intergalactic 
Charity Bazaar.
5. Captain Stardust to examine the funds for the proposed team-building retreat to the Moon 
Resort and Spa.

Subjects for Subsequent Assembly:
1. Replace on the Zogian-English dictionary.
2. Progress report on the cloaking know-how.
3. Outcomes of elevated patrols in Sector 7.
4. Remaining preparations for the Intergalactic Charity Bazaar.

To satisfy or exceed the standard of the response from GPT-4o, here’s what an Amazon Nova Professional immediate may appear to be. The immediate makes use of the identical greatest practices mentioned on this publish, beginning with the system immediate. We used a temperature of .2 and a topP of .9 right here:

You might be an skilled government assistant expert in assembly observe evaluation and 
summarization. Your main duties embody distilling advanced discussions
into clear, actionable summaries.
Comply with these directions:

##INSTRUCTIONS##
1. Learn perceive the assembly notes present in ##NOTES##
2. Put your entire outputs in a piece referred to as ##OUTPUTS## in markdown formatting
3. Summarize the assembly notes in 5 sentences or much less. Put this in a piece referred to as 
"Total Abstract".
4. Numerically record any motion gadgets for particular individuals and what must be accomplished. 
Put this record in a piece referred to as "Motion Gadgets".
5. If relevant, record the subjects that must be mentioned extra absolutely within the subsequent assembly. 
Put this in a piece referred to as "Subjects for Subsequent Assembly".

Right here’s the person immediate, utilizing prefilled responses:

##NOTES##
Assembly Date: March fifth, 2050
Assembly Time: 2:00 PM
Location: Convention Room 3B, Intergalactic Headquarters
Attendees:
- Captain Stardust
- Dr. Quasar
- Woman Nebula
- Sir Supernova
- Ms. Comet
Assembly referred to as to order by Captain Stardust at 2:05 PM
1. Introductions and welcome to our latest workforce member, Ms. Comet
2. Dialogue of our current mission to Planet Zog
- Captain Stardust: "Total, a hit, however communication with the Zogians was troublesome. 
We have to enhance our language expertise."
- Dr. Quasar: "Agreed. I am going to begin engaged on a Zogian-English dictionary instantly."
- Woman Nebula: "The Zogian meals was out of this world, actually! We should always take into account having 
a Zogian meals evening on the ship."
3. Addressing the house pirate situation in Sector 7
- Sir Supernova: "We want a greater technique for coping with these pirates. They've already 
plundered three cargo ships this month."
- Captain Stardust: "I am going to communicate with Admiral Starbeam about growing patrols in that space.
- Dr. Quasar: "I have been engaged on a brand new cloaking know-how that might assist our ships keep away from 
detection by the pirates. I am going to want a couple of extra weeks to finalize the prototype."
4. Assessment of the annual Intergalactic Bake-Off
- Woman Nebula: "I am completely happy to report that our workforce positioned second within the competitors! Our Martian Mud 
Pie was a giant hit!"
- Ms. Comet: "Let's purpose for first place subsequent yr. I've a secret recipe for Jupiter Jello that I 
assume may very well be a winner."
5. Planning for the upcoming charity fundraiser
- Captain Stardust: "We want some creative concepts for our sales space on the Intergalactic Charity Bazaar."
- Sir Supernova: "How a couple of 'Dunk the Alien' recreation? We are able to have individuals throw water balloons at a 
volunteer dressed as an alien."
- Dr. Quasar: "I can arrange a 'Identify That Star' trivia recreation with prizes for the winners."
- Woman Nebula: "Nice concepts, everybody. Let's begin gathering the provides and making ready the video games."
6. Upcoming team-building retreat
- Ms. Comet: "I wish to suggest a team-building retreat to the Moon Resort and Spa. It is an incredible 
alternative to bond and calm down after our current missions."
- Captain Stardust: "Feels like a incredible thought. I am going to examine the funds and see if we will make it occur."
7. Subsequent assembly agenda gadgets
- Replace on the Zogian-English dictionary (Dr. Quasar)
- Progress report on the cloaking know-how (Dr. Quasar)
- Outcomes of elevated patrols in Sector 7 (Captain Stardust)
- Remaining preparations for the Intergalactic Charity Bazaar (All)
Assembly adjourned at 3:15 PM. Subsequent assembly scheduled for March nineteenth, 2050 at 2:00 PM in Convention Room 3B, 
Intergalactic Headquarters.

##OUTPUTS##

The next instance reveals that the Amazon Nova response meets and exceeds the accuracy of the OpenAI instance, codecs the output in Markdown, and has discovered clear homeowners for every motion merchandise:

### Total Abstract
The assembly lined the profitable mission to Planet Zog, the continued situation with 
house pirates in Sector 7, the outcomes of the annual Intergalactic Bake-Off, plans 
for the upcoming charity fundraiser, and a proposed team-building retreat. Motion gadgets 
have been assigned, and the subsequent assembly's agenda was set.

### Motion Gadgets
1. **Dr. Quasar**: Start engaged on a Zogian-English dictionary.
2. **Captain Stardust**: Converse with Admiral Starbeam about growing patrols in Sector 7.
3. **Dr. Quasar**: Finalize the prototype for the brand new cloaking know-how inside a couple of weeks.
4. **Workforce**: Begin gathering provides and making ready video games for the Intergalactic Charity Bazaar.
5. **Captain Stardust**: Verify the funds for the proposed team-building retreat to the Moon Resort and Spa.

### Subjects for Subsequent Assembly
1. Replace on the Zogian-English dictionary (Dr. Quasar)
2. Progress report on the cloaking know-how (Dr. Quasar)
3. Outcomes of elevated patrols in Sector 7 (Captain Stardust)
4. Remaining preparations for the Intergalactic Charity Bazaar (All)

A number of updates to the immediate can obtain comparable or higher outcomes from Amazon Nova Professional whereas having fun with a a lot inexpensive price of inference.

Make use of Amazon Nova prolonged context

Amazon Nova Lite and Amazon Nova Professional can help as much as 300,000 enter tokens, which suggests which you can embody extra context in your immediate if wanted. Develop your background knowledge and detailed directions accordingly—in case your authentic OpenAI immediate was optimized for 128,000 tokens, regulate it to make use of the Amazon Nova prolonged window.

Tailor output constraints

In case your GPT immediate required strict formatting (for instance, “Reply in JSON solely”), make it possible for your Amazon Nova immediate contains these directives. Moreover, in case your process entails multimodal inputs, specify when to incorporate photos or video references.

Perform calling

The rise of generative AI brokers has made perform calling, or instrument calling, some of the vital talents of a given large language model (LLM). A mannequin’s capacity to accurately choose the best instrument for the job, in a low-latency method, is commonly the distinction between success and failure of an agentic system.

Each OpenAI and Amazon Nova fashions share similarities in perform calling, specifically their help for structured API calls. Each mannequin households help instrument choice by means of outlined instrument schemas, which we focus on later on this publish. In addition they each present a mechanism to determine when to invoke these instruments or not.

OpenAI’s perform calling makes use of versatile JSON schemas to outline and construction API interactions. The fashions help a variety of schema configurations, which give builders the flexibility to shortly implement exterior perform calls by means of easy JSON definitions tied to their API endpoints.

Right here is an instance of a perform:

instruments = [{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current temperature for a given location.",
            "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and country e.g. Montevideo, Uruguay"
                        }            
                },            
                "required": [                
                    "location"
        ],
        "additionalProperties": False
    },
    "strict": True
    }
}]

completion = consumer.chat.completions.create(
    mannequin="gpt-4o",
    messages=[{"role": "user", "content": "What is the weather like in Punta del Este today?"}],
    instruments=instruments

Much like OpenAI’s strategy, Amazon Nova can name instruments when handed a configuration schema as proven within the following code instance. Amazon Nova has made heavy use of Greedy Decoding when calling instruments, and it’s suggested to set temperature, topP, and topK to 1. This makes certain that the mannequin has the best accuracy in instrument choice. These Grasping Decoding parameters and different nice examples of instrument use are lined in nice element in Tool use (function calling) with Amazon Nova.

The next is an instance of perform calling with out utilizing additionalModelRequestFields:

tool_config = {
    "instruments": [{
        "toolSpec": {
            "name": "get_recipe",
            "description": "Structured recipe generation system",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "recipe": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "ingredients": {
                                    "type": "array",
                                    "items": {
                                        "type": "object",
                                        "properties": {
                                            "item": {"type": "string"},
                                            "amount": {"type": "number"},
                                            "unit": {"type": "string"}
                                        }
                                    }
                                },
                                "instructions": {
                                    "type": "array",
                                    "items": {"type": "string"}
                                }
                            },
                            "required": ["name", "ingredients", "instructions"]
                        }
                    }
                }
            }
        }
    }]
}

# Base configuration with out topK=1
input_text = "I want a recipe for chocolate lava cake"
messages = [{
    "role": "user",
    "content": [{"text": input_text}]
}]

# Inference parameters
inf_params = {"topP": 1, "temperature": 1}

response = consumer.converse(
    modelId="us.amazon.nova-lite-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params
)
# Sometimes produces much less structured or incomplete output
 

The next instance reveals how perform calling accuracy will be improved through the use of

additionalModelRequestFields:

# Enhanced configuration with topK=1
response = consumer.converse(
    modelId="us.amazon.nova-lite-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params,
    additionalModelRequestFields={"inferenceConfig": {"topK": 1}}
)
# Produces extra correct and structured perform name

To maximise Amazon Nova perform calling potential and enhance accuracy, all the time use additionalModelRequestFields with topk=1. This forces the mannequin to pick out the one most possible token and prevents random token choice. This will increase deterministic output technology and improves perform name precision by about 30–40%.

The next code examples additional clarify how one can conduct instrument calling efficiently. The primary state of affairs reveals recipe technology with out an specific instrument. The instance doesn’t use topK, which generally ends in responses which might be much less structured:

input_text = """
I am on the lookout for a decadent chocolate dessert that is fast to arrange. 
One thing that appears fancy however is not sophisticated to make.
"""

messages = [{
    "role": "user",
    "content": [{"text": input_text}]
}]

response = consumer.converse(
    modelId="us.amazon.nova-lite-v1:0",
    messages=messages,
    inferenceConfig={"topP": 1, "temperature": 1}
)
# Generates a conversational recipe description
# Much less structured, extra narrative-driven response

On this instance, the state of affairs reveals recipe technology with a structured instrument. We add topK set to 1, which produces a extra structured output:

response = consumer.converse(
    modelId="us.amazon.nova-lite-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig={"topP": 1, "temperature": 1},
    additionalModelRequestFields={"inferenceConfig": {"topK": 1}}
)
# Generates a extremely structured, JSON-compliant recipe
# Consists of exact ingredient measurements
# Supplies step-by-step directions

Total, OpenAI provides extra versatile, broader schema help. Amazon Nova supplies extra exact, managed output technology and is the only option when working with high-stakes, structured knowledge situations, as demonstrated in Amazon Nova’s efficiency on the IFEval benchmark mentioned in part 2.1.1 of the technical report and mannequin card. We advocate utilizing Amazon Nova for functions requiring predictable, structured responses as a result of its perform calling methodology supplies superior management and accuracy.

Conclusion

The evolution from OpenAI’s fashions to Amazon Nova represents a big shift in utilizing AI. It reveals a transition towards fashions that ship comparable or superior efficiency at a fraction of the price, with expanded capabilities in multimodal processing and prolonged context dealing with.

Whether or not you’re utilizing the sturdy, enterprise-ready Amazon Nova Professional, the agile and economical Amazon Nova Lite, or the versatile Amazon Nova Micro, the advantages are clear:

  • Price financial savings – With token prices as much as 4 instances decrease, companies can scale functions extra economically
  • Enhanced response efficiency – Quicker response instances (as much as 190 tokens per second) make real-time functions extra viable
  • Expanded capabilities – A bigger context window and multimodal help unlock new functions, from detailed doc evaluation to built-in visible content material

By evolving your immediate technique—redefining use instances, exploiting the prolonged context, and iteratively refining directions—you’ll be able to easily migrate your current workflows from OpenAI’s o4 and o4-mini fashions to the progressive world of Amazon Nova.


Concerning the Authors

Claudio Mazzoni is a Sr Specialist Options Architect on the Amazon Bedrock GTM workforce. Claudio exceeds at guiding costumers by means of their Gen AI journey. Exterior of labor, Claudio enjoys spending time with household, working in his backyard, and cooking Uruguayan meals.

Pat Reilly is a Sr. Specialist Options Architect on the Amazon Bedrock Go-to-Market workforce. Pat has spent the final 15 years in analytics and machine studying as a marketing consultant. When he’s not constructing on AWS, you will discover him fumbling round with wooden tasks.



Source link

Read more

Read More