Google’s Gemini is the world’s most capable multimodal AI yet | The Business Standard
Skip to main content
  • Epaper
  • Economy
    • Aviation
    • Banking
    • Bazaar
    • Budget
    • Industry
    • NBR
    • RMG
    • Corporates
  • Stocks
  • Analysis
  • Videos
    • TBS Today
    • TBS Stories
    • TBS World
    • News of the day
    • TBS Programs
    • Podcast
    • Editor's Pick
  • World+Biz
  • Features
    • Panorama
    • The Big Picture
    • Pursuit
    • Habitat
    • Thoughts
    • Splash
    • Mode
    • Tech
    • Explorer
    • Brands
    • In Focus
    • Book Review
    • Earth
    • Food
    • Luxury
    • Wheels
  • Subscribe
    • Epaper
    • GOVT. Ad
  • More
    • Sports
    • TBS Graduates
    • Bangladesh
    • Supplement
    • Infograph
    • Archive
    • Gallery
    • Long Read
    • Interviews
    • Offbeat
    • Magazine
    • Climate Change
    • Health
    • Cartoons
  • বাংলা
The Business Standard

Wednesday
May 14, 2025

Sign In
Subscribe
  • Epaper
  • Economy
    • Aviation
    • Banking
    • Bazaar
    • Budget
    • Industry
    • NBR
    • RMG
    • Corporates
  • Stocks
  • Analysis
  • Videos
    • TBS Today
    • TBS Stories
    • TBS World
    • News of the day
    • TBS Programs
    • Podcast
    • Editor's Pick
  • World+Biz
  • Features
    • Panorama
    • The Big Picture
    • Pursuit
    • Habitat
    • Thoughts
    • Splash
    • Mode
    • Tech
    • Explorer
    • Brands
    • In Focus
    • Book Review
    • Earth
    • Food
    • Luxury
    • Wheels
  • Subscribe
    • Epaper
    • GOVT. Ad
  • More
    • Sports
    • TBS Graduates
    • Bangladesh
    • Supplement
    • Infograph
    • Archive
    • Gallery
    • Long Read
    • Interviews
    • Offbeat
    • Magazine
    • Climate Change
    • Health
    • Cartoons
  • বাংলা
WEDNESDAY, MAY 14, 2025
Google’s Gemini is the world’s most capable multimodal AI yet

Tech

Rifat Ahmed
23 December, 2023, 02:20 pm
Last modified: 23 December, 2023, 02:21 pm

Related News

  • Dua Lipa, Elton John seek protection from AI
  • Students are outsmarting artificial intelligence detectors with artificial stupidity
  • The voice of possibility: How Verbex.ai is giving AI a Bangladeshi accent
  • Love in the age of algorithms: How AI is rewriting online dating
  • Expelled from Columbia for cheating, funded millions for scaling it

Google’s Gemini is the world’s most capable multimodal AI yet

Because of its ability to understand nuanced information, Gemini AI can solve or offer answers to questions that were previously impossible to have machine-solved without giving it more context or adding metadata

Rifat Ahmed
23 December, 2023, 02:20 pm
Last modified: 23 December, 2023, 02:21 pm
In almost every standardised benchmark, Gemini outperforms its contemporaries, including the widely praised GPT-4 from OpenAI. Photo: Collected
In almost every standardised benchmark, Gemini outperforms its contemporaries, including the widely praised GPT-4 from OpenAI. Photo: Collected

Earlier this year, Google AI's Brain division merged with DeepMind, a British-American artificial intelligence research lab that Google acquired in 2014. The first 'big' thing to come from this newly formed team, dubbed Google DeepMind, is the 'GPT-4 killer' Gemini.

Google's Gemini is a multimodal large language model (LLM) that is built on the PaLM 2 architecture, with improvements in efficiency, multimodal capabilities, and future-proofing for memory and planning.

In almost every standardised benchmark, Gemini knocks its contemporaries, including the widely praised GPT-4 from OpenAI, out of the equation. But what surprised everyone the most during its 6 December announcement was the fact that Gemini was the first AI model that outperformed human experts on Massive Multitask Language Understanding (MMLU).

The Business Standard Google News Keep updated, follow The Business Standard's Google news channel

That means, in this standardised method of testing an AI model's capabilities, Gemini is better at understanding, answering, and solving problems than humans who are considered the definitive experts in their respective fields.

But the initial shock of the 'Gemini Era' came from its monumental multimodal capabilities.

With a dataset of 540 billion words and code, 14 million images, and access to Google Search, Gemini, unlike other AI language models, can understand video and audio on top of text, pictures, and code.

More importantly, its multimodality is built from the ground up, which ensures the reasoning across textual, verbal, and nonverbal modes is seamless. That means, instead of typing a complex question that is harder to explain in words, a user would be able to simply show, explain, and illustrate the question through a video for go-to help, as shown in the demonstration video during the presentation.

From that interaction, Gemini could understand what is happening in a video, what is being said by the person on the video, or even nonverbal motions or cues like hand gestures to understand the context of that interaction.

Since it is trained to recognise, understand, and interpret text, video, and audio simultaneously, it can even identify motion and change its answer accordingly, making it particularly useful in applied mathematics, physics, engineering, statistics, and simulations.

Due to its ability to understand nuanced information, it can solve or offer answers to questions that were previously impossible to have machine-solved without giving it more context or adding metadata.

The first version of Gemini also knows, understands, and can generate code in programming languages like Python, C++, and Java. Since it uses AlphaCode 2, it has the ability to associate complex data to work simultaneously across different programming languages to generate high-quality code, making it the best AI model for coding. In fact, its generated code is better than 85% human programs, not to mention the fact that it can write a monolith of code in a few seconds, which would take a human hours or even days to finish.

To accommodate everyone and every environment, Google Gemini comes in three sizes.

Gemini Nano, the smallest of three, is Google's most efficient model for smaller on-device tasks.

Gemini Pro is the better and larger version, suitable for large-scale on-device executions and other tasks. A fine-tuned version of it has already been integrated into Bard for more advanced reasoning, understanding, and execution. Gemini Pro is also available to enterprise clients and developers via the Gemini API in Google Cloud Vertex AI or Google AI Studio.

Gemini Ultra is the largest and most capable of the three and can handle highly complex tasks that require advanced AI capabilities. Google plans to launch Bard Advanced, a new and richer AI experience, early next year with the capabilities of Gemini Ultra.

Gemini is also designed in a way that newer technologies like memory and planning can be easily integrated within the architecture of the model. This future-proofing and Google's plan to make parts of Gemini open-source for more collaborative innovation across the board makes it clear that Google wants Gemini to be an integral part of their decades in the future.

Gemini AI / gemini / AI

Comments

While most comments will be posted if they are on-topic and not abusive, moderation decisions are subjective. Published comments are readers’ own views and The Business Standard does not endorse any of the readers’ comments.

Top Stories

  • Bangladesh Bank Governor Ahsan H Mansur. TBS Sketch
    IMF loan: Bangladesh Bank announces market-based exchange rate
  • Police fired tear gas, sound grenades to disperse a long march by Jagannath University (JnU) students and teachers heading towards the chief adviser’s residence in Jamuna today (14 May). Screengrab
    JnU's 'March to Jamuna': 25 injured as police fire tear gas, lob sound grenades on students, teachers
  • Chhatra Dal leaders and activists protest in front of the VCs residence inside Dhaka University campus protesting the death of Chhatra Dal leader Shahriar Alam Shammo on 14 May 2025. Photo: Rajib Dhar/TBS
    DU students, teachers, Chhatra Dal protest killing of student leader Shammo

MOST VIEWED

  • Representational image. File Photo: UNB
    Army updates contact numbers for people seeking help across Dhaka, surrounding districts
  • IMF agrees to release $1.3b in June for Bangladesh as disagreement over exchange rate flexibility resolved
    IMF agrees to release $1.3b in June for Bangladesh as disagreement over exchange rate flexibility resolved
  • Logo of bkash. Photo: Collected
    bKash posts Tk132cr profit in three months
  • Infograph: TBS
    More woes for businesses as govt plans almost doubling minimum tax
  • File photo of a new NBR office in Agargaon, Dhaka. Photo: UNB
    NBR dissolved, 2 new divisions created amid commotion of customs and tax officials
  • Collage shows [from left] shows the woman rushing to her house with the cat after, getting into the lift and the cat that was beaten. Collage: TBS
    Animal abuse outrages citizens: Grameenphone condemns incident allegedly involving employee

Related News

  • Dua Lipa, Elton John seek protection from AI
  • Students are outsmarting artificial intelligence detectors with artificial stupidity
  • The voice of possibility: How Verbex.ai is giving AI a Bangladeshi accent
  • Love in the age of algorithms: How AI is rewriting online dating
  • Expelled from Columbia for cheating, funded millions for scaling it

Features

Sketch: TBS

‘National University is now focusing on technical and language education’

15h | Pursuit
Illustration: TBS

How to crack the code to get into multinational companies

17h | Pursuit
More than 100 trucks of pineapples are sold from Madhupur every day, each carrying 3,000 to 10,000 pineapples. Photo: TBS

The bitter aftertaste of Madhupur's sweet pineapples

17h | Panorama
Stryker was released three months ago, with an exclusive deal with Foodpanda. Photo: Courtesy

Steve Long’s journey from German YouTuber to Bangladeshi entrepreneur

1d | Panorama

More Videos from TBS

1 June set for verdict on Jamaat-e-Islami's appeal to regain political party registration

1 June set for verdict on Jamaat-e-Islami's appeal to regain political party registration

37m | TBS Today
How did Bank Asia double its deposit growth?

How did Bank Asia double its deposit growth?

57m | TBS Programs
Handover of Pushed-In Bangladeshis to Their Families

Handover of Pushed-In Bangladeshis to Their Families

1h | TBS Today
Israeli attack on Gaza amid ceasefire, 81 killed

Israeli attack on Gaza amid ceasefire, 81 killed

1h | TBS World
EMAIL US
contact@tbsnews.net
FOLLOW US
WHATSAPP
+880 1847416158
The Business Standard
  • About Us
  • Contact us
  • Sitemap
  • Advertisement
  • Privacy Policy
  • Comment Policy
Copyright © 2025
The Business Standard All rights reserved
Technical Partner: RSI Lab

Contact Us

The Business Standard

Main Office -4/A, Eskaton Garden, Dhaka- 1000

Phone: +8801847 416158 - 59

Send Opinion articles to - oped.tbs@gmail.com

For advertisement- sales@tbsnews.net