🧠 Machine Learning Project

Instagram Reels Virality Predictor

A classification model exploring whether Instagram Reels metadata (duration, hook strength, niche, music type) can predict viral performance. Built as an end-to-end ML learning project, from exploratory data analysis through feature engineering, dimensionality reduction, model training, and deployment.

Baseline Model (Logistic Regression)BEST
Testing Accuracy51.25%
Training Accuracy57.81%

✓ Retains full feature interpretability

PCA ApproachDROP
Testing Accuracy45%
Training Accuracy47.5%

↓ 6.25% accuracy drop

What this means for marketing teams

The signal under the noise.

Niche and music type emerged as the strongest categorical predictors, content category and audio choice matter more than caption or hashtag strategies.

Video duration and hook strength showed moderate predictive power, the first 3 seconds and optimal length are worth testing systematically.

Baseline model outperformed PCA, keeping full feature interpretability proved more valuable than dimensionality reduction for this dataset.

Key Features (pre-posting data)
Niche (Category)90%
Music Type85%
Duration65%
Hook Strength55%

Based on logistic regression coefficients

🗄️
400
Instagram Reels
📊
4
Input Features
Project workflow
  • Exploratory Data Analysis
  • Feature Engineering
  • PCA Dimensionality Reduction
  • Logistic Regression Modeling
  • Model Evaluation & Comparison
  • Streamlit App Deployment
Tech stack
PythonPandasNumPyScikit-LearnLogistic RegressionPCAStreamlit

Key takeaway

This project demonstrates an end-to-end ML workflow, from data exploration through deployment. While accuracy is modest (51%), the interpretable baseline model reveals actionable insights: content niche and music type are stronger predictors of virality than video length or hook strength alone.

I bring this rigor to marketing.

Data-driven decisions, interpretable models, and honesty about what the numbers can and can't say.

Work with me →