The Telephone Game: An Ultimate Test for LLMs and Image Generators

4 min readOct 21, 2023


In the ever-evolving world of artificial intelligence, one of the most intriguing challenges is evaluating Artificial Intelligence that has now long passed the Turing Test and thus calls for more tailored tests. For example, ensuring that AI models such as Language Models (LLMs) and Image Generators can communicate effectively and interpret information accurately can be a challenge. The Telephone Game, also known as Broken Telephone or Chinese Whispers, offers a unique and fascinating way to test the abilities of these AI systems. In this article, we explore how this classic childhood game can be used to evaluate LLMs and image generators in two distinctive ways:

  1. Reverse Engineered Image Interpretation : The first test involves generating an image from a textual prompt and checking if this image can be correctly interpreted in a loyal fashion to the initial prompt.
  2. Passing the Prompt from One AI to Another: In the second test, we take the prompt generated from an image and pass it to another AI, which, in turn, interprets it.

To illustrate these tests, we’ll use DALL·E-3 as our image generator and BARD (Bi-directional Automated Response and Discussion) as our LLM image interpreter that is currently a free alternative to ChatGPT Vision available through the ChatGPT plus subsription. Let’s dive into each test and its implications.

Test 1: Image Interpretation from a Prompt

Prompt for DALL·E-3: “Generate an infograph about a man’s life challenges in his 40s as a single.”

DALL·E-3 uses this prompt to generate an image that encapsulates the challenges faced by a man in his 40s who is single. This image could depict various scenes, scenarios, or symbols associated with such a life. Once the image is generated, the key question is: Can BARD accurately interpret this image and describe the challenges it represents?

Mindmap of a Man in his 40s

This test evaluates not only the ability of DALL·E-3 to generate images based on textual prompts but also BARD’s capability to comprehend and translate visual information into meaningful text.

Test 2: Passing the Prompt from One AI to Another

In this second test, we alternate the reversal of the process. We generate an image with AI-1 to then decrypt it with AI-2 to then generate an image with AI-3 and so on and so forth until the last AI intercepts the message without betraying the initial prompt. I will let you play with this and report how far AIs can play the telephone game before the singal is completely lost in translation.

The Significance of These Tests

The Telephone Game, with its two distinct tests, serves as a powerful evaluation tool for AI systems like DALL·E-3 and BARD, as well as their counterparts. Here’s why these tests are so valuable:

  1. Cross-Modal Understanding: AI models often specialize in either text or images. The Telephone Game tests the ability of these models to bridge the gap between these two modalities, demonstrating their capacity for cross-modal understanding.
  2. Bi-directional Capacity: If the AIs use the same models or are trained on the same data, we can test their bi-directional capabilities which is the case for BARD (Bidirectional Encoder Representations from Transformers) that should be successful at translating back and forth between images and prompts.
  3. Interpretability and Consistency: The tests assess the interpretability of AI-generated content. Can a generated image or text be consistently and accurately interpreted by other AI models, maintaining fidelity to the initial prompt or image?
  4. Human-like Communication: Successful performance in these tests brings us closer to the goal of human-like communication between AI systems. It’s a step toward achieving AI systems that can not only generate content but also interpret and discuss it.

Challenges and Future Directions

It’s important to acknowledge that the Telephone Game also poses its own set of challenges. Variability in interpretation, potential biases in AI-generated content, and the limitations of current AI models are some of the hurdles that need to be overcome.

As AI continues to advance, these tests provide a valuable benchmark to measure progress in AI language understanding and image generation. It prompts us to think about the potential applications of AI in content creation, communication, and accessibility, where reliable cross-modal interpretation is paramount.

In conclusion, the Telephone Game, an age-old test of communication, has found a new and exciting role in the realm of AI. By subjecting LLMs and image generators to these tests, we are not only evaluating their current capabilities but also pushing the boundaries of what AI can achieve in terms of cross-modal understanding and communication. As technology continues to advance, the results of these tests could be a glimpse into the future of AI-driven content generation and interpretation.




Heptaglot Artist, Data Scientist, Filmmaker exploring Creative AI. Started the GAN AI Art Movement (2016). Former Postdoc @CNRS PhD @INFORMATICS. 3xTEDx Speaker