Google accelerates on AI: Gemini now interprets what it sees on screen and through the camera
The AI assistant enters the era of multimodality: real-time visual recognition enhances productivity and human-machine interaction
Google has taken a significant new step in the global AI race. With the latest release of Gemini, the AI assistant included in the Google One AI Premium service, the Mountain View company introduces a feature that radically transforms interaction with mobile devices: the ability to “see” what’s on the screen or in front of the camera, analyze it, and respond in real time.
From textual input to visual understanding
No longer limited to voice or text queries, Gemini now interacts directly with visual content. Users can point their camera at an object, take a picture of a document, or simply ask for clarification on something they’re reading or viewing on their smartphone. This is the tangible realization of “Project Astra,” announced by Google in 2024 — a multimodal AI platform capable of interpreting the world in a way increasingly similar to humans.
Towards a new standard of assisted productivity
In the business world, this evolution offers immediate advantages: visual support in reading reports, contextual assistance in managing charts, interpretation of technical diagrams or marketing materials. Forecasts indicate that this kind of AI will increase comprehension and decision-making speed by 40% in corporate tasks, with a significant boost in productivity.
The global race for the perfect assistant
The competition remains fierce. Amazon is preparing to launch Alexa Plus, and Apple is working on a major Siri overhaul, but Google stays one step ahead. The fact that even Samsung has chosen to integrate Gemini as the default assistant on its smartphones sends a clear message: leadership in the AI sector is no longer just about cloud infrastructure and models, it’s about user experience.
Multimodality: the new frontier
With Gemini, Google is betting on an AI increasingly capable of interpreting text, voice, images, and context. It’s an intelligent assistance model that goes beyond simply “answering questions,” evolving into a cognitive tool that can truly support people in their daily and professional lives — and perhaps even anticipate their needs.
A market with exponential growth
According to IDC estimates, the global market for AI assistants is projected to exceed $45 billion by 2027, with a compound annual growth rate (CAGR) of over 30%. With Gemini, Google aims to secure a strategic share of this ecosystem, integrating its AI into devices, work environments, and consumer services—from Android to Workspace.
Its real strength? A multimodal and contextual approach that enables more natural and productive interaction. A competitive edge that translates into real value for both businesses and users.
A challenge that goes beyond business
But the race for intelligent assistants is not just about market share. In a world where artificial intelligence is increasingly central to national strategies, the ability to govern and deploy advanced digital assistants has become a lever of global influence.
As the United States, China, and the European Union develop their respective AI Acts and digital sovereignty frameworks, technological dominance over intelligent interfaces — like Gemini — takes on a geopolitical dimension. The real question is no longer just who answers better, but who leads the conversation with the future.