Raccoons playing with water?
Vision-Language Models (VLMs) are definitely something to keep an eye on! 👀
Yesterday, I came across Hugging Face announcing a new model from Google called PaliGemma 2 on their platform, and I couldn’t resist thinking, "I have to try this on some wildlife images!" And guess what? You can too!
All you need is a little knowledge of Python and how to use a terminal, and you’re ready to dive in. Here's how:
1️⃣ Create a Hugging Face account – you'll need this to comply with some licenses/terms and condition.
2️⃣ Install the CLI: Follow the guide here: https://lnkd.in/dd8eZJsD , and add your token.
3️⃣ Install the required pip packages.
4️⃣ Run the example code: Find it here: https://lnkd.in/dmJ4UzUY or explore other models here: https://lnkd.in/d55BtDAk.
5️⃣ Update the code to point to your image file and let the magic happen!
Pro Tip: I found that leaving the text input blank provided more interesting results, but if you want to experiment with prompts, go for it and have fun! 🎉
PaliGemma 2 is just one out of many VLMs, where DINOv2, BLIP are some of the more popular ones. BioCLIP is one made more specific for animals, plants and insects.