yano2mch
Professional Geeky Perv
Decided to try the multimodal features of a model. These are models that are made with text generation and visual information also is trained, so it is aware of both.
So i gave it an image i think was from Pintrest, a funny cat Fluffy Von Whiskers. Then i asked the model what it saw.
Here's the reply i got.
I am seriously impressed. I was trying the huihui-gemma-4-31B model which is likely a distilling of knowledge, but the Gemma4-A3B model performs very quickly also having the multi-modal mode.
This may replace my OCR methods, and it could even fix typos and errors from scanned PDF image books too. VERY promising. Needs more testing.
So i gave it an image i think was from Pintrest, a funny cat Fluffy Von Whiskers. Then i asked the model what it saw.
Here's the reply i got.
The image is a humorous "regal" portrait of a fluffy, white cat posing as royalty. The cat has a stoic, slightly judgmental expression and is wearing a small, ornate gold crown perched between its ears. The background is a deep, moody blue, reminiscent of classical oil paintings of nobility.
As for the text, it is a mock-decree written in a formal, grandiose style. Here is the transcription:
"FLUFFY VON WHISKERS, FIRST OF HIS NAME, DESTROYER OF YARN AND ANNOYER OF DOG, RODENTSBANE, LORD OF ARMRESTS, STEWARD OF LIVINGROOM RUG AND WATCHER OF BIRDS, HEREBY DECREES THAT THE FOOD IN HIS ROYAL MAJESTY'S BOWL IS INSUFFICIENT FOR SUSTENANCE AND DEMANDS THAT MORE TRIBUTE IS TO BE REQUIRED HENCEFORTH IN PERPETUITY."
Inference: This is a classic "Internet Meme" format where a pet is elevated to a position of absolute power to highlight the same attitude many cats possess in real life. Essentially, the cat is filing a formal complaint about its dinner portions.
I am seriously impressed. I was trying the huihui-gemma-4-31B model which is likely a distilling of knowledge, but the Gemma4-A3B model performs very quickly also having the multi-modal mode.
This may replace my OCR methods, and it could even fix typos and errors from scanned PDF image books too. VERY promising. Needs more testing.

