Wildlife TechTutorial

Using Vision-Language Models (VLMs) for wildlife

Discover wildlife imaging with Vision-Language Models like PaliGemma 2 on Hugging Face, install the CLI, run Python code, and generate stunning animal insights.

Hugo Markoff
Hugo Markoff
Using Vision-Language Models (VLMs) for wildlife

Raccoons playing with water?

Vision-Language Models (VLMs) are definitely something to keep an eye on! 👀

Yesterday, I came across Hugging Face announcing a new model from Google called PaliGemma 2 on their platform, and I couldn’t resist thinking, "I have to try this on some wildlife images!" And guess what? You can too!


All you need is a little knowledge of Python and how to use a terminal, and you’re ready to dive in. Here's how:

1️⃣ Create a Hugging Face account – you'll need this to comply with some licenses/terms and condition.

2️⃣ Install the CLI: Follow the guide here: https://lnkd.in/dd8eZJsD , and add your token.

3️⃣ Install the required pip packages.

4️⃣ Run the example code: Find it here: https://lnkd.in/dmJ4UzUY or explore other models here: https://lnkd.in/d55BtDAk.

5️⃣ Update the code to point to your image file and let the magic happen!

Pro Tip: I found that leaving the text input blank provided more interesting results, but if you want to experiment with prompts, go for it and have fun! 🎉


PaliGemma 2 is just one out of many VLMs, where DINOv2, BLIP are some of the more popular ones. BioCLIP is one made more specific for animals, plants and insects.

Hugo Markoff

About Hugo Markoff

Co-founder of Animal Detect, the dog man.