Ideally, this could translate to finetuning and RAG-ing capacities of an LLM, however I am particularly interested in how much an LLM is capable of generalizing from a learned data to something it has never seen before without going into the hallucination rabbit hole.
To give a better example, here is a test I used to challenge several multi-model LLMs especially vision-capable commercial models such as Gemini, Bing, ChatGPT, llama 3 and Claude.
The challenge was neuroplastic for me to make for I had to handwrite in mirror effect which is quite challenging especially if you’re doing it for the first time. Leonardo Da Vinci is known to writing in mirror format.
In case these images were ever fed to the multi-modal LLM it would not be fair to test their ability to decode new mirror-coded hand-written text.
Following is my mirror hand-writing that shall put several mmLLMs to test. Are you excited?
I will show you screenshots from the outputs divided into two sections, hall of fame and hall of shame:
HALL of FAME:
HALL of SHAME:
More on Gemini, llama, prixtal…
I am in awe, especially with GPT4o and Claude 3.5 Sonnet! That is to say the least!