Raccoons playing with water?

Vision-Language Models (VLMs) are definitely something to keep an eye on! 👀

Yesterday, I came across Hugging Face announcing a new model from Google called PaliGemma 2 on their platform, and I couldn’t resist thinking, "I have to try this on some wildlife images!" And guess what? You can too!

All you need is a little knowledge of Python and how to use a terminal, and you’re ready to dive in. Here's how:

1️⃣ Create a Hugging Face account – you'll need this to comply with some licenses/terms and condition.

2️⃣ Install the CLI: Follow the guide here: https://lnkd.in/dd8eZJsD , and add your token.

3️⃣ Install the required pip packages.

4️⃣ Run the example code: Find it here: https://lnkd.in/dmJ4UzUY or explore other models here: https://lnkd.in/d55BtDAk.

5️⃣ Update the code to point to your image file and let the magic happen!

Pro Tip: I found that leaving the text input blank provided more interesting results, but if you want to experiment with prompts, go for it and have fun! 🎉

PaliGemma 2 is just one out of many VLMs, where DINOv2, BLIP are some of the more popular ones. BioCLIP is one made more specific for animals, plants and insects.

Using Vision-Language Models (VLMs) for wildlife

Discover wildlife imaging with Vision-Language Models like PaliGemma 2 on Hugging Face, install the CLI, run Python code, and generate stunning animal insights.

About Hugo Markoff

Read next