Car Key-Points Detection with MobileNetV1

Predicts windshield and headlight key-points on car images using a fine-tuned MobileNetV1 backbone with custom regression heads, trained on a tiny dataset (90 images) heavily augmented to 9,000 samples.

Objective

Predict precise key-point coordinates for windshield corners and headlight centers across diverse car images. The model outputs continuous (x, y) coordinates rather than classification labels, framing the task as a multi-output regression problem.

Approach

Backbone: MobileNetV1 with all layers trainable (no frozen feature extractor).
Head: Custom dense layers producing key-point coordinates.
Parameters: ~24M trainable.
Training data: 90 base images → augmented to 9,000 samples (rotations, translations, brightness, horizontal flips with corresponding key-point remapping).
Optimization: 20 epochs, MSE loss on normalized coordinates.

Results

The model generalizes well across viewpoints despite the tiny base dataset, demonstrating that aggressive geometric augmentation can substitute for large labeled corpora in key-point regression tasks.

Car image with predicted keypoints overlaid

Takeaways

MobileNetV1 is a strong choice when inference latency matters and the task is geometrically simple.
Per-key-point augmentation correctness is the hard part — every transform must remap ground-truth coordinates exactly, or the model learns noise.
Small-data regression benefits hugely from depth-wise separable convolutions, which reduce overfitting.