Google recently unveiled Gemini, its latest suite of powerful AI models. However, the tech giant is facing criticism for allegedly misrepresenting the capabilities of Gemini in a demo video during the announcement.
The six-minute video showcased Gemini’s multimodal capabilities, combining spoken conversational prompts with image recognition. It portrayed Gemini as capable of recognizing images quickly, responding within seconds, and tracking real-time movements, such as a cup and ball game. However, a disclaimer in the video description on YouTube revealed that “for the purposes of this demo, latency has been reduced, and Gemini outputs have been shortened for brevity.”
Parmy Olson, in a Bloomberg op-ed, raised concerns about Google’s admission that the video demo did not occur in real-time with spoken prompts. Instead, still image frames from raw footage were used, and text prompts were written to which Gemini responded. Olson emphasized the discrepancy between what Google suggested in the video and the actual process, arguing that it misleads viewers about the AI’s real-time capabilities.
Google has a history of facing skepticism regarding demo videos, such as the Duplex demo showcasing an AI voice assistant making restaurant reservations. The lack of ambient noise and overly helpful employees led to doubts about the authenticity of the demo.
Oriol Vinyals, the vice president of research and deep learning lead at Google’s DeepMind and co-lead for Gemini, responded to the recent criticism. Vinyals explained that all the user prompts and outputs in the video are real but shortened for brevity. He stated that the video aimed to illustrate potential multimodal user experiences with Gemini and inspire developers.
The debate underscores the challenges tech companies face in presenting AI capabilities transparently and authentically, especially in the face of increasing scrutiny and competition in the AI landscape. Critics argue that true inspiration comes from allowing developers and journalists to experience the product directly through public beta testing rather than carefully edited promotional videos.