back to work
complete

Gender Classification

ML pipeline for gender classification from facial images. Built to understand the full cycle: data prep, model selection, training, evaluation, and honest assessment of limitations.

Python scikit-learn OpenCV Jupyter github

The problem

An academic project turned into a real exercise in ML engineering: not just “train a model and report accuracy” but understanding why a model performs the way it does and what that means in practice.

Decisions

Classical CV features before deep learning — started with HOG (Histogram of Oriented Gradients) + SVM rather than jumping to CNNs. Forces you to understand what features matter and gives a meaningful baseline to beat.

OpenCV for preprocessing — face detection, alignment, and normalization pipeline before any model training. The quality of this step matters more than model choice for noisy input.

Explicit train/val/test split — a common mistake in academic projects is evaluating on the training set. Proper split + stratification to handle class imbalance.

Honest assessment

Binary gender classification from facial images is a technically solvable problem but an ethically loaded one. The project was built to learn ML engineering fundamentals, not to deploy. The model achieves high accuracy on the test set but would fail on edge cases, non-binary individuals, and anything outside the training distribution.

The main takeaway: model accuracy is the easy part. Knowing when not to deploy is the harder skill.