DAL Sheep Classification 2025
2nd Place - Kaggle Competition
Public LB
Private LB
Model Ensemble
Challenge Overview
A Kaggle AI competition where the goal was to classify sheep breeds from images using computer vision. The dataset was relatively small, which made overfitting a real concern, and my early models kept getting stuck around 85% accuracy. The biggest part of the work was figuring out which model architectures, training strategies, and augmentations actually helped move past that plateau.
Goal
Classify sheep breeds from images
Evaluation
Accuracy on public and private leaderboard
Dataset
Small dataset requiring careful regularization
Key Challenge
Breaking through a stubborn ~85% accuracy plateau
Model Journey
Click to see details. Each step below shows what I changed and how it affected performance.
ConvNeXt-Large
Initial Attempt
Started with ConvNeXt-Large as the backbone. Got 83% on the public leaderboard, which felt disappointing at first. Funny thing though, this initial model scored 94% on the private leaderboard. I started better than I thought :)
CoAtNet
Architecture Switch + Bug Fix
Switched to CoAtNet which combines convolution and attention mechanisms. Initially got the same 83%, but after finding and fixing training bugs the accuracy jumped to 92%.
+ CosineAnnealing
Scheduler Breakthrough
Added CosineAnnealingWarmRestarts as the learning rate scheduler. This was the key breakthrough, the warm restarts helped the model escape local minima that were keeping it stuck.
+ Gradual Unfreezing
Fine-tuning Strategy
Implemented gradual unfreezing: epochs 0–10 trained only the classification head with the backbone frozen, then epoch 11+ unfroze everything. This careful approach yielded a +3% accuracy boost.
Full Ensemble
Final Ensemble
Combined 4 diverse models (CoAtNet-3, EVA02-Base, ViT-Base, ConvNeXt-Small) with weighted soft voting. Each model captures different features. The diversity across the models helped push the score to 97% on the public leaderboard and 98% on the private leaderboard.
Final Ensemble Architecture
CoAtNet-3
Primary model, best CV performancecoatnet_3_rw_224.sw_in12k
Best individual cross-validation score. Combines convolution for local features with self-attention for global context.
EVA02-Base
Strong on texture featureseva02_base_patch14_224.mim_in22k
Excellent at capturing fine-grained texture differences between breeds. Pre-trained on ImageNet-22k with masked image modeling.
ViT-Base
Attention-based, captures global contextvit_base_patch16_224.augreg_in21k
Pure attention architecture provides a different inductive bias. Captures global spatial relationships that CNNs can miss.
ConvNeXt-Small
Lightweight, good regularizationconvnext_small.fb_in22k
Smaller model with good regularization properties. Adds diversity to the ensemble without overfitting to training patterns.
Weighted Soft Voting
Weighted average of softmax outputs
Final Prediction
Training Strategy
Data Augmentation
MixUp (α=0.2) + CutMix (α=0.8), randomly applied with 50% probability
Rotation, Color Jitter, Random Erasing, TTA (8 samples)
Gradual Unfreezing
Epochs 0–10: frozen backbone, train head only
Epochs 11+: unfreeze everything at once
+3% accuracy boost
Data Augmentation
MixUp (α=0.2) + CutMix (α=0.8), randomly applied with 50% probability
Rotation, Color Jitter, Random Erasing, TTA (8 samples)
CosineAnnealingWarmRestarts
Learning rate scheduler that periodically restarts LR to escape local minima. This was the key breakthrough that pushed accuracy from 83% to 92%.
Key breakthrough: 83% → 92%
Training Config
What Didn't Work
A big part of the competition was testing ideas quickly and dropping the ones that didn’t help.
Pseudo-labeling
Tried using model predictions on unlabeled data as pseudo-labels for additional training. No measurable improvement. So, I removed it.
Object detection first
Tried isolating sheep with an object detection model before classification. The gains looked limited compared to the extra complexity and time.