Google accelerates on AI: Gemini now interprets what it sees on screen and through the camera
Favicon bianco

Google accelerates on AI: Gemini now interprets what it sees on screen and through the camera

BREAKING NEWS

NEWSLETTER

The AI assistant enters the era of multimodality: real-time visual recognition enhances productivity and human-machine interaction

Google has taken a significant new step in the global AI race. With the latest release of Gemini, the AI assistant included in the Google One AI Premium service, the Mountain View company introduces a feature that radically transforms interaction with mobile devices: the ability to “see” what’s on the screen or in front of the camera, analyze it, and respond in real time.

From textual input to visual understanding

No longer limited to voice or text queries, Gemini now interacts directly with visual content. Users can point their camera at an object, take a picture of a document, or simply ask for clarification on something they’re reading or viewing on their smartphone. This is the tangible realization of “Project Astra,” announced by Google in 2024 — a multimodal AI platform capable of interpreting the world in a way increasingly similar to humans.

Towards a new standard of assisted productivity

In the business world, this evolution offers immediate advantages: visual support in reading reports, contextual assistance in managing charts, interpretation of technical diagrams or marketing materials. Forecasts indicate that this kind of AI will increase comprehension and decision-making speed by 40% in corporate tasks, with a significant boost in productivity.

The global race for the perfect assistant

The competition remains fierce. Amazon is preparing to launch Alexa Plus, and Apple is working on a major Siri overhaul, but Google stays one step ahead. The fact that even Samsung has chosen to integrate Gemini as the default assistant on its smartphones sends a clear message: leadership in the AI sector is no longer just about cloud infrastructure and models, it’s about user experience.

Multimodality: the new frontier

With Gemini, Google is betting on an AI increasingly capable of interpreting text, voice, images, and context. It’s an intelligent assistance model that goes beyond simply “answering questions,” evolving into a cognitive tool that can truly support people in their daily and professional lives — and perhaps even anticipate their needs.

A market with exponential growth

According to IDC estimates, the global market for AI assistants is projected to exceed $45 billion by 2027, with a compound annual growth rate (CAGR) of over 30%. With Gemini, Google aims to secure a strategic share of this ecosystem, integrating its AI into devices, work environments, and consumer services—from Android to Workspace.

Its real strength? A multimodal and contextual approach that enables more natural and productive interaction. A competitive edge that translates into real value for both businesses and users.

A challenge that goes beyond business

But the race for intelligent assistants is not just about market share. In a world where artificial intelligence is increasingly central to national strategies, the ability to govern and deploy advanced digital assistants has become a lever of global influence.

As the United States, China, and the European Union develop their respective AI Acts and digital sovereignty frameworks, technological dominance over intelligent interfaces — like Gemini — takes on a geopolitical dimension. The real question is no longer just who answers better, but who leads the conversation with the future.

In Evidence

For years, a powerful and pervasive narrative has shaped our professional ambitions: success is an impeccable suit, tailored with flawless results, linear career paths, and...
For years, a powerful and pervasive narrative has shaped our professional ambitions: success is an impeccable suit, tailored with flawless results, linear career paths, and...