🥈2nd Place

DAL Sheep Classification 2025

2nd Place - Kaggle Competition

Public LB

Private LB

Model Ensemble

PyTorchComputer VisionTransfer LearningEnsemble

Challenge Overview

A Kaggle AI competition where the goal was to classify sheep breeds from images using computer vision. The dataset was relatively small, which made overfitting a real concern, and my early models kept getting stuck around 85% accuracy. The biggest part of the work was figuring out which model architectures, training strategies, and augmentations actually helped move past that plateau.

Goal

Classify sheep breeds from images

Evaluation

Accuracy on public and private leaderboard

Dataset

Small dataset requiring careful regularization

Key Challenge

Breaking through a stubborn ~85% accuracy plateau

Model Journey

Click to see details. Each step below shows what I changed and how it affected performance.

83%

ConvNeXt-Large

Initial Attempt

Started with ConvNeXt-Large as the backbone. Got 83% on the public leaderboard, which felt disappointing at first. Funny thing though, this initial model scored 94% on the private leaderboard. I started better than I thought :)

83% → 92%

CoAtNet

Architecture Switch + Bug Fix

Switched to CoAtNet which combines convolution and attention mechanisms. Initially got the same 83%, but after finding and fixing training bugs the accuracy jumped to 92%.

Escaped plateau

+ CosineAnnealing

Scheduler Breakthrough

Added CosineAnnealingWarmRestarts as the learning rate scheduler. This was the key breakthrough, the warm restarts helped the model escape local minima that were keeping it stuck.

92% → 95%

+ Gradual Unfreezing

Fine-tuning Strategy

Implemented gradual unfreezing: epochs 0–10 trained only the classification head with the backbone frozen, then epoch 11+ unfroze everything. This careful approach yielded a +3% accuracy boost.

97% / 98%

Full Ensemble

Final Ensemble

Combined 4 diverse models (CoAtNet-3, EVA02-Base, ViT-Base, ConvNeXt-Small) with weighted soft voting. Each model captures different features. The diversity across the models helped push the score to 97% on the public leaderboard and 98% on the private leaderboard.

Final Ensemble Architecture

35%

CoAtNet-3

Primary model, best CV performance

coatnet_3_rw_224.sw_in12k

Best individual cross-validation score. Combines convolution for local features with self-attention for global context.

30%

EVA02-Base

Strong on texture features

eva02_base_patch14_224.mim_in22k

Excellent at capturing fine-grained texture differences between breeds. Pre-trained on ImageNet-22k with masked image modeling.

20%

ViT-Base

Attention-based, captures global context

vit_base_patch16_224.augreg_in21k

Pure attention architecture provides a different inductive bias. Captures global spatial relationships that CNNs can miss.

15%

ConvNeXt-Small

Lightweight, good regularization

convnext_small.fb_in22k

Smaller model with good regularization properties. Adds diversity to the ensemble without overfitting to training patterns.

Weighted Soft Voting

Weighted average of softmax outputs

Final Prediction

Training Strategy

Data Augmentation

MixUp (α=0.2) + CutMix (α=0.8), randomly applied with 50% probability

Rotation, Color Jitter, Random Erasing, TTA (8 samples)

Gradual Unfreezing

Epochs 0–10: frozen backbone, train head only

Epochs 11+: unfreeze everything at once

+3% accuracy boost

With vs Without Gradual Unfreezing

Epoch 0Head only

Standard

Input Image

Conv 1–3

Conv 4–6

Conv 7–9

Head

Prediction

Gradual

Input Image

Conv 1–3

Conv 4–6

Conv 7–9

Head

Prediction

Validation Accuracy

Standard — 10.0%Gradual — 10.0%

Data Augmentation

MixUp (α=0.2) + CutMix (α=0.8), randomly applied with 50% probability

Rotation, Color Jitter, Random Erasing, TTA (8 samples)

CosineAnnealingWarmRestarts

Learning rate scheduler that periodically restarts LR to escape local minima. This was the key breakthrough that pushed accuracy from 83% to 92%.

Key breakthrough: 83% → 92%

Learning Rate Schedule

Warm restartT₀=5, T_mult=2, η_max=1e-3

Training Config

Batch Size16

Epochs20

Base LR1e-5

Cross Validation5-Fold Stratified

Loss60% CE + 40% Focal

GPURTX 6000 Ada

Image Size224×224 (384×384 ConvNeXt)

What Didn't Work

A big part of the competition was testing ideas quickly and dropping the ones that didn’t help.

Pseudo-labeling

Tried using model predictions on unlabeled data as pseudo-labels for additional training. No measurable improvement. So, I removed it.

Object detection first

Tried isolating sheep with an object detection model before classification. The gains looked limited compared to the extra complexity and time.

View Full Solution on Kaggle

Back to Projects

Brewing ideas, coding intelligence

DAL Sheep Classification 2025

Challenge Overview

Model Journey

ConvNeXt-Large

CoAtNet

+ CosineAnnealing

+ Gradual Unfreezing

Full Ensemble

Final Ensemble Architecture

CoAtNet-3

EVA02-Base

ViT-Base

ConvNeXt-Small

Training Strategy

Data Augmentation

Gradual Unfreezing

Data Augmentation

CosineAnnealingWarmRestarts

Training Config

What Didn't Work