Let’s jump into the wilderness of standards, formats, and benchmarks for animal detection together! Don’t be scared! I'll focus mainly on convolutional neural networks (CNNs) used for inference to find, classify, and segment animals, but I'll also mention other methods worth keeping an eye on.

📌 Understanding the State of the Art (SOTA)

Before diving in, let's look at what the current SOTA is doing, there’s plenty here to inspire us.

Initially, my approach from robotics, and what I've seen others use, has typically involved single-shot image recognition models, like YOLO, to directly classify objects in whole images. But wildlife tech handles things differently because of the incredible variability in images: animals of different types, similar-looking species, day/night images, blurry visuals, partial visibility, movement, and more.

The proven approach:

Detect → Crop → Classify.

This method was supported by research published last year titled "To crop or not to crop: Comparing whole-image and cropped classification on a large dataset of camera trap images".

Automate your camera trap image analysis with Animal Detect

Join today to analyze thousands of images in minutes using latest AI models.

Try for free

I strongly believe this strategy would also be beneficial for underwater imagery and insect data, especially given the vast variability in these datasets. For a quick primer, check out my article on "classical" animal classifiers here.

📸 Datasets – Kaggle, LILA BC, and GitHub

Before jumping into models, let’s talk data!

If we openly share our collected data, we'll collectively build better, more effective models. While Kaggle has some engaging datasets, I strongly recommend LILA BC as the standard source for animal datasets: LILA BC datasets.

Millions of images from diverse projects.
COCO-camera trap format metadata for easy starting points.

Note: I haven't found insect datasets here yet, but they'd probably appreciate contributions! 😉

Several projects also share their data via GitHub repositories, but if I had to recommend a central place for standards, it would unquestionably be LILA.

🛠️ Training Your Models? Here’s a Tip:

Before training from scratch:

Consider fine-tuning existing models with your data (research "fine-tuning" carefully).

If you decide to build from scratch, pay close attention to benchmarks and avoid overfitting or underfitting. My suggestion for a balanced and robust dataset split:

Training (60%): Diverse data.
Validation (20%): 50% from your training distribution (not the actual same images), 50% completely new scenarios (different regions, conditions, animals).
Test (20%): Primarily (50–100%) from completely new scenarios.

This ensures your model generalizes well to various conditions and environments.

🔖 Why Licenses Matter

Most wildlife detection projects use YOLO models, either for direct single-shot classification or for detection tasks. While GitHub repositories typically have MIT licenses, many overlook that the underlying models might have different licenses. Specifically, after YOLOv5, Ultralytics (the main developer behind YOLO) placed YOLO models under the AGPL-3.0 license, affecting subsequent versions like YOLOv8. This restricts how others can reuse your code and models.

Alternatives:

There are MIT-licensed versions of YOLO models, such as this one used by PyTorch Wildlife, which offer similar performance with slight differences.

Licenses ensure how freely others can use or modify their work:

MIT License: A permissive, "far copyleft" license ideal for broad use.
AGPL-3.0: Imposes stricter conditions, potentially limiting broader adoption.

There are many other code licenses which I do not cover, you can read about some common ones here

🐾 Top Detection Models Worth Benchmarking

MegaDetector – “Binary” Animal detection model, trained from images of trail cameras, with additional Human and Vehicle classes.

It was initially created by Dan Morris and now maintained by PyTorch Wildlife (Microsoft). Trained on millions of global camera-trap images, detecting animals, humans, and vehicles. It offers exceptional accuracy, though newer models from PyTorch Wildlife may detect slightly more animals, I personally also see an increase in false positives. Limitations include difficulty with non-standard angles such as top-view images. Consider contributing data to the team, for them to provide even better models, they are especially in need of Europe data now, and fine-tuning for specific applications instead of reinventing the wheel.

MegaFishDetector – The counterpart of MegaDetector but for fish.

Binary looking for fishes from underwater images. Since humans and vehicles are not so present underwater, this model exclusively looks for fishes. This is so far the best model I have tried for binary fish detection, but at the same time it’s very limited and AFAIK no one is continuously working on it and it only looks for fish, not crabs, starfish or other underwater animals. The model may also not work fine from the view of fishes which are not side-view underwater.

Insects Detector?

Well I am quite sure if I searched enough I would be able to find something. I noticed FlatBug which claim to have a decent detection rate and a few different datasets with some variations. But with a relatively small dataset I would assume it will work to some extent, but not great on new scenarios/insects. I have some hope that https://insectai.eu/ may be working in the right direction? Hopefully an open-source dataset + model?

Drone detector Models – Lacking behind…

While there are some drone models which I tested, ironically, often MegaDetector, made for camera traps, are doing best on my own images where the animal(s) are clear to see. While models like HerdNet shows impressive accuracy from their own benchmarks, also on animals far away, it’s still limited to a few classes and used as a classifier more than a detector, and I could not get any usable results on my own test data.

I have some information that PyTorch Wildlife is working on an overhead/drone image model, which AFAIK would be able to work more as a detector and maybe also classification? Let’s wait and see.

What about Classification Models:

Wow… There are many, often scattered everywhere, with various licenses and models ranging from very regional to globally applicable ones. It’s genuinely confusing and challenging to navigate, but before you ger started to start from scratch it’s for sure worth exploring. I can't cover all of them, but I’ll highlight some:

State-of-the-art Global Classification Model: SpeciesNet by Google

The first of its kind, a globally applicable animal classification model for camera trap images. With intelligent country-code filters, it prevents unlikely classifications, like a kangaroo appearing in Denmark. With over 2,000 classes and versatile hierarchical classification, it’s a significant step forward. Though there are gaps with missing animals, I'm hopeful for continuous improvements and increased community and developer engagement to expand coverage.

European State-of-the-Art Classification Model: DeepFaune

"A collaborative project combining millions of images and artificial intelligence for automatic recognition of European fauna in camera-trap images."

An impressive initiative focused specifically on European fauna, but its license complicates direct integration with projects like our own Animal Detect, though collaboration could change that 😊 contact me if you are interested. If your research focuses purely on Europe, DeepFaune is highly recommended.

Rest of the World?

Addax Data Science provides tools for testing various open-source models locally, sourced from multiple global projects. Worth exploring yourself!

Segementation Models:

I haven't yet extensively explored segmentation models specifically for animals. However, I’ve seen several papers incorporating segmentation as part of their detection workflow. I’ve personally tested the Segment Anything Model (SAM), which performs impressively on animals and supports fine-tuning. YOLOv8-seg is another viable option.

Where’s the real issue?

Frequently, I see papers claiming state-of-the-art performance but rarely benchmarking against existing models. Repeatedly collecting new data, training, validating, and testing on the same dataset won't propel us forward. We urgently need standardized benchmark datasets and transparent sharing of performance metrics. It’s frustrating to repeatedly see projects that don’t compare their deer-detection results against existing open-source models.

I invite major organizations, companies, or projects to help create standardized benchmark datasets. You don't need to share your valuable raw datasets, just standardized testing data, benchmark procedures, and clear documentation. Let’s push innovation forward collaboratively!

Formats: Which to Choose?

Now, formats are genuinely confusing. It turns out we're not just tech-focused, ecologists and biologists extensively use CSV and XLSX formats instead of the JSON or TXT I initially expected, coming from the robotics field. Excel remains popular! But then, what information do we standardize? Latin names, English, local? It remains largely subjective.

Tech-focused individuals have two primary formats to consider:

Personally, I prefer the flexibility of the COCO Camera Trap format, although it may be too loosely defined. Interestingly, CameraTrap DP does address my previous complaint by clearly specifying taxonomy fields (details here). Perhaps I overlooked this, and I should consider integrating it into Animal Detect.

However, the broader issue is adoption: why is CameraTrap DP not more widely used?

Re-identification of animals – can we set a standard or is it not possible?

So, the most fine-grained type of classification which is often mentioned in animal space is the re-identification task. While there are several different approaches, which I briefly cover here - MegaDescriptor highlights significant challenges in form of standardization within the re-identification space. – and I would love to see if there are a way to follow the flow of the ViT models and what else to see how fine grained we can make embeddings to work on a common species re-identification approach, where we can already see a bit of a movement in the WildlifeDatasets where people are willing to share their data and results. Kudos!

Benchmarks!

Please help to make some, I don’t want to hear another project getting several million euros in funding to re-invent the wheel. Instead let’s make that wheel better! I have frankly heard about several projects working on re-identifications of cows. I challenged one project spending nearly 3 years on developing their re-identification of cows, to check out the different cow re-identification projects on existing on Wildlife Datasets.

Firstly, the person thought what they had made was completely novel, which means there was a lack on research, and indeed, they did not achieve the performance of one of the models. For each classifier, detector and re-identification models, start benchmarking again other existing models, while also providing a more simple way for people to benchmark against you!

Summary:

Let’s standardize approaches for working with animal, fish, and insect data. Instead of reinventing, let’s optimize and innovate collaboratively. For open-source contributors, clearly consider licenses, share methods transparently, and centralize datasets. I have homework to do on the Camtrap DP format as a potential overlooked standard.

Finally, for leading models, please provide clear benchmarking procedures and test datasets to foster genuine innovation!

Into the Wild World of Animal-AI Standards

SOTA CNN-based wildlife detection guide: datasets, benchmarks, COCO/Camtrap DP formats, licensing insights, and model training tips.