Can you improve your animal classifier with image bursts?
Last week I had the pleasure to meet with Sameeruddin, one of the authors of the paper “TemporalSwin-FPN Net: A Novel Pipeline for Metadata-Driven Sequence Classification in Cameratrap Imagery” which proposes a pipeline to do exactly this!
I posted an article last week, where I was testing out animal classifiers and noticed that the accuracy was not as I wished for images. But looking at videos, I could see some frames were identifying animals wrong, but the majority would correctly classify it. Here I suggested that using something like majority voting could improve the overall classification.
Sameer and his team had something similar but quite different in mind. Instead of videos, many camera traps have the option to take a series/burst of images with a certain interval, often resulting in 3-5 images with about 1 second apart. Here they then extract the metadata, and grouped up series of images, run TemporalSwin-FPN Net to reach the best accuracy, and classify the animal with higher accuracy.
Previously, series of images may individually result in one frame classifying the animal as a wallaby, and the next two as a kangaroo, but with this approach they would manage to correctly classify the single animal just as a kangaroo.
The image is taken from the paper, which Sameer has been so kind to allow me to share.