In December we first launched native picture output in Gemini 2.0 Flash to trusted testers. At present, we’re making it out there for developer experimentation throughout all regions presently supported by Google AI Studio. You’ll be able to take a look at this new functionality utilizing an experimental model of Gemini 2.0 Flash (gemini-2.0-flash-exp) in Google AI Studio and by way of the Gemini API.
Gemini 2.0 Flash combines multimodal enter, enhanced reasoning, and pure language understanding to create photographs.
Listed below are some examples of the place 2.0 Flash’s multimodal outputs shine:
1. Textual content and pictures collectively
Use Gemini 2.0 Flash to inform a narrative and it’ll illustrate it with footage, holding the characters and settings constant all through. Give it suggestions and the mannequin will retell the story or change the fashion of its drawings.
Story and illustration era in Google AI Studio
2. Conversational picture enhancing
Gemini 2.0 Flash helps you edit photographs by many turns of a pure language dialogue, nice for iterating in direction of an ideal picture, or to discover completely different concepts collectively.
Multi-turn dialog picture enhancing sustaining context all through the dialog in Google AI Studio
3. World understanding
In contrast to many different picture era fashions, Gemini 2.0 Flash leverages world information and enhanced reasoning to create the proper picture. This makes it excellent for creating detailed imagery that’s real looking–like illustrating a recipe. Whereas it strives for accuracy, like all language fashions, its information is broad and basic, not absolute or full.
Interleaved textual content and picture output for a recipe in Google AI Studio
4. Textual content rendering
Most picture era fashions battle to precisely render lengthy sequences of textual content, usually leading to poorly formatted or illegible characters, or misspellings. Inner benchmarks present that 2.0 Flash has stronger rendering in comparison with main aggressive fashions, and nice for creating commercials, social posts, and even invites.
Picture outputs with lengthy textual content rendering in Google AI Studio
Begin making photographs with Gemini in the present day
Get began with Gemini 2.0 Flash by way of the Gemini API. Learn extra about picture era in our docs.
from google import genai
from google.genai import varieties
shopper = genai.Shopper(api_key="GEMINI_API_KEY")
response = shopper.fashions.generate_content(
mannequin="gemini-2.0-flash-exp",
contents=(
"Generate a narrative a few cute child turtle in a 3d digital artwork fashion. "
"For every scene, generate a picture."
),
config=varieties.GenerateContentConfig(
response_modalities=["Text", "Image"]
),
)
Whether or not you might be constructing AI brokers, creating apps with lovely visuals like illustrated interactive tales, or brainstorming visible concepts in dialog, Gemini 2.0 Flash lets you add textual content and picture era with only a single mannequin. We’re desirous to see what builders create with native picture output and your feedback will assist us finalize a production-ready model quickly.