Connect with us


Google Gemini: Everything you need to know about the new generative AI platform




Google Gemini: Everything you need to know about the new generative AI platform

Google is trying to make waves with Gemini, its flagship suite of generative AI models, apps and services.

So what is Gemini? How can you use it? And how does it compare to the competition?

To make it easier to stay on top of the latest Gemini developments, we’ve put together this handy guide, which we’ll keep up to date as new Gemini models, features and news about Google’s plans for Gemini to appear.

What is Gemini?

Gemini is from Google long promised, the next-generation GenAI model family, developed by Google’s AI research labs DeepMind and Google Research. It is available in three flavors:

  • Twin Ultrathe most performing Gemini model.
  • Gemini Proa “lite” Gemini model.
  • Twin Nanoa smaller ‘distilled’ model that works on mobile devices like the Pixel 8 Pro.

All Gemini models are trained to be ‘natively multimodal’ – in other words, able to work and use more than just words. They were pre-trained and refined based on a variety of audio, images and videos, a large number of code bases and text in different languages.

This sets Gemini apart from models like Google’s own LaMDA, which is trained solely on text data. LaMDA cannot understand or generate anything other than text (e.g. essays, email drafts), but that is not the case with Gemini models.

What is the difference between the Gemini apps and Gemini models?

Image credits: Googling

Google proves once again that it has no talent for branding and has not made it clear from the start that Gemini is separate from the Gemini apps on the web and mobile (formerly Bard). The Gemini apps are simply an interface through which certain Gemini models can be accessed – think of it as a client for Google’s GenAI.

Incidentally, the Gemini apps and models are also completely independent of Imagen 2, Google’s text-to-image model available in some of the company’s development tools and environments.

What can Gemini do?

Because the Gemini models are multimodal, they can theoretically perform a range of multimodal tasks, from transcribing speech to captioning images and videos to generating artwork. Some of these capabilities have already reached the product stage (more on that later), and Google is promising them all – and more – sometime in the not-too-distant future.

Of course, it’s a bit difficult to take the company at its word.

Google seriously underperformed with the original Bard launch. And more recently, a video purporting to show off Gemini’s abilities was shown to be heavily manipulated and more or less ambitious.

Assuming Google is more or less honest with its claims, here’s what Gemini’s different layers can do once they reach their full potential:

Twin Ultra

Google says that Gemini Ultra – thanks to its multimodality – can be used to help with things like physics homework, solving problems on a worksheet step-by-step, and pointing out possible errors in already completed answers.

Gemini Ultra can also be applied to tasks such as identifying scientific articles relevant to a particular problem, says Google – extracting information from those articles and “updating” a graph from a graph by generating the formulas needed are to recreate the graph with more recent data.

Gemini Ultra technically supports image generation, as mentioned earlier. But that capability hasn’t yet made its way to the product version of the model – perhaps because the mechanism is more complex than the way apps like ChatGPT generate images. Instead of passing prompts to an image generator (like DALL-E 3, in the case of ChatGPT), Gemini outputs images “natively,” with no intermediate step.

Gemini Ultra is available as an API through Vertex AI, Google’s fully managed AI developer platform, and AI Studio, Google’s web-based tool for app and platform developers. It also supports the Gemini apps, but not for free. Accessing Gemini Ultra through what Google calls Gemini Advanced requires a subscription to the Google One AI Premium Plan, which costs $20 per month.

The AI ​​Premium Plan also connects Gemini to your broader Google Workspace account – think emails in Gmail, documents in Docs, presentations in Sheets and Google Meet recordings. This is useful if, for example, you want to summarize emails or, as a Gemini, want to take notes during a video call.

Gemini Pro

Google says Gemini Pro is an improvement over LaMDA in its reasoning, planning and understanding capabilities.

An independent study by Carnegie Mellon and BerriAI researchers found that the first version of Gemini Pro was indeed better than OpenAI’s GPT-3.5 at handling longer and more complex reasoning chains. But the study also found that this version of Gemini Pro, like all major language models, particularly struggled with math problems involving multiple numbers, and users found examples of poor reasoning and obvious errors.

However, Google promised solutions – and the first one arrived in the form of Gemini 1.5 Pro.

Designed as a drop-in replacement, the Gemini 1.5 Pro has been improved in a number of areas compared to its predecessor, perhaps most significantly in the amount of data it can process. Gemini 1.5 Pro can hold ~700,000 words, or ~30,000 lines of code – 35x as much as Gemini 1.0 Pro can handle. And because the model is multimodal, it is not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in different languages, albeit slowly (for example, searching for a scene in an hour-long video takes 30 seconds to a minute of processing).

Gemini 1.5 Pro went into public preview on Vertex AI in April.

An additional endpoint, Gemini Pro Vision, can process text And images – including photos and video – and output text along the lines of OpenAI’s GPT-4 with Vision model.


Using Gemini Pro in Vertex AI. Image credits: Twin

Within Vertex AI, developers can tailor Gemini Pro to specific contexts and use cases using a refinement or “grounding” process. Gemini Pro can also connect to external third-party APIs to perform certain actions.

In AI Studio, there are workflows for creating structured chat prompts with Gemini Pro. Developers have access to both Gemini Pro and the Gemini Pro Vision endpoints, and they can adjust model temperature to control the creative range of output and previews to provide tone and style guidance – as well as fine-tune safety settings .

Twin Nano

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and is efficient enough to run directly on (some) phones rather than sending the job to a server somewhere. So far it supports a number of features on the Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.

The Recorder app, which lets users press a button to record and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations, and other clips. Users get these summaries even when they have no signal or Wi-Fi connection. And with a nod to privacy, no data leaves their phone.

Gemini Nano is also included in Gboard, Google’s keyboard app. Powering there is a feature called Smart Reply, which helps suggest the next thing you want to say during a conversation in a messaging app. The feature will initially only work with WhatsApp, but will come to more apps over time, Google says.

And in the Google Messages app on supported devices, Nano enables Magic Compose, which can create messages in styles like “excited,” “formal,” and “lyrical.”

Is Gemini better than OpenAI’s GPT-4?

Google has several times touted Gemini’s benchmark superiority, claiming that Gemini Ultra outperforms current state-of-the-art results on “30 out of 32 commonly used academic benchmarks used in research and development of large language models.” The company says that Gemini 1.5 Pro, meanwhile, is better capable of tasks like content summarization, brainstorming, and writing than Gemini Ultra in some scenarios; presumably this will change with the release of the next Ultra model.

But putting aside the question of whether benchmarks really indicate a better model, the scores Google gives appear to be only marginally better than OpenAI’s corresponding models. And – as previously mentioned – some first impressions weren’t great, with users and… academics noting that the older version of Gemini Pro tends to misinterpret basic facts, struggles with translations, and provides poor coding suggestions.

How much does Gemini cost?

Gemini 1.5 Pro is free to use in the Gemini apps and for the time being in AI Studio and Vertex AI.

Once Gemini 1.5 Pro leaves preview in Vertex, the model will cost $0.0025 per character, while the output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (approximately 140 to 250 words) and, in the case of models like Gemini Pro Vision, per image ($0.0025).

Let’s assume that a 500-word article contains 2,000 characters. In summary, that item with Gemini 1.5 Pro would cost $5. Meanwhile, generating an article of similar length would cost $0.1.

Ultra pricing has not yet been announced.

Where can you try Gemini?

Gemini Pro

The easiest place to experience Gemini Pro is in the Gemini apps. Pro and Ultra answer questions in different languages.

Gemini Pro and Ultra are also accessible as a preview in Vertex AI via an API. The API is currently free to use “within certain limits” and supports certain regions, including Europe, as well as functionalities such as chat functionality and filtering.

Elsewhere, Gemini Pro and Ultra can be found in AI Studio. Using the service, developers can iterate on prompts and Gemini-based chatbots, then obtain API keys to use them in their apps – or export the code to a more fully featured IDE.

Code Assist (formerly Duet AI for developers), Google’s suite of AI-powered code completion and generation tools, uses Gemini models. Developers can perform ‘large-scale’ changes to codebases, for example updating dependencies between files and reviewing large chunks of code.

Google has added Gemini models to its development tools for Chrome and the Firebase mobile development platform, as well as to its database creation and management tools. And it has launched new security products backed by Gemini, such as Gemini in Threat Intelligence, a part of Google’s Mandiant cybersecurity platform that can analyze large swaths of potentially malicious code and let users search in natural language for ongoing threats or indicators of compromise.