CAP
Advanced ยท 3 weeks

Capstone Project

End-to-end data science project on a self-chosen real-world dataset. Full CRISP-DM cycle, GitHub repository, written report, and live presentation. Counts for 30% of course grade.

Objectives

  • Select and justify a real-world dataset with genuine analytical interest
  • Implement a complete preprocessing and feature engineering pipeline
  • Compare and evaluate at least 3 ML models using cross-validation
  • Tune the best model with GridSearchCV and document performance gain
  • Generate SHAP values and interpret feature contributions
  • Write a 2,500-word professional report in 6-section CRISP-DM structure
  • Build a public GitHub repository with reproducible results
  • Deliver a 10-minute presentation to a non-technical panel

The capstone constitutes 30% of the overall course grade. Criteria: analytical rigour (40%), code quality and reproducibility (30%), written report and communication (20%), presentation delivery (10%).

Deliverables and Timeline

Project Proposal (Week 10, Day 3)

500 words: problem statement, dataset description, proposed methodology, success metrics.

Intermediate Code Review (Week 11, Day 2)

Working Jupyter notebooks covering EDA and at least one trained model.

Final Report and Code (Week 12, Day 2)

Complete GitHub repository, 2,500-word written report, and slide deck.

Final Presentation (Week 12, Day 3)

10-minute live presentation followed by 5 minutes of Q&A.

Suggested Datasets

  • Kaggle: Credit Card Fraud Detection (284,807 transactions)
  • Kaggle: Lagos State Property Prices (10,000+ listings)
  • UCI: Bank Marketing Dataset (45,211 records)
  • Kaggle: Nigerian Stock Exchange Historical Prices
  • WHO: Global Health Observatory (country-level indicators)
  • Your own professional dataset (strongly recommended)

Start the Curriculum

Browse all 12 tutorial weeks free, or enrol in a live cohort for instructor-led training with personalised feedback.