Higher Colleges of Technology · ML in Marketing

🔬 Colab 🤗 HuggingFace ⚡ GitHub 🎨 Gradio 🧠 sklearn 🚀 AutoML

Digital Marketing,
Marketing Analytics & AI+

A hands-on, 10-week applied course for non-programmers. Every concept starts with a marketing analogy, then becomes runnable Python code. No prior coding experience required — just business curiosity and a Google account.

Weeks: 10 modules
Tools: Colab · HF · Gradio · GitHub
AI Co-Pilot: Student & Manager modes
Final Project: Live Kaggle deployment

🙏 Design credit: This interactive lab is adapted from the Gies iMBA Learning Lab (Gies College of Business, UIUC). The three-panel layout, collapsible lab pattern, AI co-pilot concept, and Canvas simulations are innovations from that original framework. This is a respectful adaptation for the ML in Marketing curriculum at Higher Colleges of Technology — not an original creation.

Week 1 · Introduction

Why ML for Marketing?

Machine learning is not magic — it is a systematic way of finding patterns in data at a scale humans cannot match manually. As a marketer, you already possess the most valuable skill: you understand what question to ask. ML gives you the tools to answer it at scale.

The core loop: You have data about customers → ML finds the pattern → the pattern predicts future behaviour → you act on that prediction. Everything in this course is a variation on that loop.

We distinguish two types: Supervised (you provide the answer key — e.g., which customers churned) and Unsupervised (the algorithm discovers structure — e.g., natural customer segments). Marketing primarily uses supervised learning.

Key vocabulary: A feature is any known information about a customer (age, purchase history, email open rate). A label is what you want to predict (did they buy? will they churn?). Your model learns the relationship between features and labels from historical data.

EXAMPLE 1.1 Setting Up Your Colab Environment

▶ Open in Colab

Python · Setup

# ── Cell 1: Install & import everything for this course ──
!pip install scikit-learn pandas matplotlib seaborn --quiet

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

print("✅ Environment ready! Let's do some marketing ML.")

EXAMPLE 1.2 Loading a Real Marketing Dataset

Python · Load Data

# ── Cell 2: Load UCI Bank Marketing dataset ──
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/bank-additional.csv"
df = pd.read_csv(url, sep=";")

print(f"Shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
df.head()

EXAMPLE 1.3 Your First Exploratory Data Analysis

Python · EDA

# ── Cell 3: Understand the target variable ──
print("=== Target Distribution ===")
print(df["y"].value_counts())
print(df["y"].value_counts(normalize=True).round(3))

fig, ax = plt.subplots(figsize=(8,4))
for outcome in ["no", "yes"]:
    ax.hist(df[df["y"]==outcome]["age"], bins=30, alpha=.6, label=f"Subscribed: {outcome}")
ax.set_xlabel("Age"); ax.legend(); plt.tight_layout(); plt.show()

Output

=== Target Distribution === no 36548 (88.7%) yes 4640 (11.3%)

Marketing insight

Only 11.3% of contacts subscribed — a class imbalance. If we predicted "no" for everyone, we'd be right 88.7% of the time! This is why Accuracy is misleading for marketing ML. We fix this in Week 3.

Try it yourself

How many unique job types are in the dataset? Use df["job"].value_counts()
What is the average campaign calls for subscribers vs non-subscribers? Use df.groupby("y")["campaign"].mean()
Install Hugging Face datasets: !pip install datasets then from datasets import load_dataset

Week 1 Takeaway

ML does not replace marketing judgment — it amplifies it. Your job is to know what question to ask; Python's job is to find the pattern at scale.

Week 2 · sklearn Foundations

sklearn Blocks & Your First Model

The scikit-learn library follows one consistent interface: fit → predict → score. Learn these three methods once and you can use any of the 50+ models in the library. The pattern is identical whether predicting house prices or customer churn.

The fundamental principle: We train on past data to predict future data we have never seen. This is why we hold out some data for testing — testing on training data just checks whether the model memorised history, not whether it can generalise.

EXAMPLE 2.1 Train/Test Split + Linear Regression

▶ Open in Colab

Python · Regression

import pandas as pd; import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

np.random.seed(42); n = 500
df = pd.DataFrame({
    "ad_spend":    np.random.uniform(1000, 50000, n),
    "email_opens": np.random.randint(50, 2000, n),
    "social_posts":np.random.randint(5, 100, n),
})
df["revenue"] = (3.5*df["ad_spend"] + 12*df["email_opens"]
    + 300*df["social_posts"] + np.random.normal(0,8000,n))

X = df[["ad_spend","email_opens","social_posts"]]; y = df["revenue"]
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=.2, random_state=42)

model = LinearRegression().fit(X_train, y_train)
print(f"R² on test set: {model.score(X_test, y_test):.3f}")
print("Coefficients:", dict(zip(X.columns, model.coef_.round(2))))

EXAMPLE 2.2 Classification — Predicting Purchase Intent

Python · Classification

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Numeric columns from the bank dataset loaded in Week 1
num_cols = ["age","campaign","pdays","previous"]
X2 = df[num_cols].fillna(0)
y2 = (df["y"] == "yes").astype(int)

X_tr,X_te,y_tr,y_te = train_test_split(X2, y2, test_size=.2, random_state=42)
clf = LogisticRegression(max_iter=1000).fit(X_tr, y_tr)

print(classification_report(y_te, clf.predict(X_te)))
fig, ax = plt.subplots(figsize=(4,4))
ConfusionMatrixDisplay.from_estimator(clf, X_te, y_te, ax=ax)
plt.tight_layout(); plt.show()

Live Simulation — Train/Test Split Visualiser

⚡ Adjust test size and watch the split change

Test size20%

Regression vs Classification

Regression predicts a number (lifetime value, expected spend). Classification predicts a category (will buy / won't buy). The sklearn interface is identical — only the output and metric change.

Try it yourself

Change test_size=0.3. Does R² improve or worsen? Why?
Try LogisticRegression(class_weight="balanced"). How does the recall for class 1 change?
Use the model to predict revenue for a new campaign: model.predict([[20000, 500, 30]])

Week 2 Takeaway

sklearn's unified fit → predict → score interface means you can swap any model with two lines of code. Master the workflow once, then experiment freely.

Week 3 · Evaluation & Cross-Validation

Are We Actually Good?

A single train/test split is like running one A/B test in one city and calling it global truth. The result depends on which 20% you happened to hold out — different random seeds give meaningfully different scores. Cross-validation fixes this by rotating through every part of your data as the test set.

The K-Fold analogy: You want to test whether a new loyalty email works. Instead of testing only on your Dubai customers, you test on Dubai, Abu Dhabi, Sharjah, Al Ain, and RAK separately, then average. That is 5-Fold Cross-Validation. Each emirate is one "fold."

EXAMPLE 3.1 Why a Single Split Lies — K-Fold Solution

▶ Open in Colab

Python · The variance problem

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=10, random_state=0)
model = RandomForestClassifier(n_estimators=50, random_state=1)

# Same model, 20 different splits → how much do scores vary?
scores = [train_test_split(X,y,test_size=.2,random_state=s) for s in range(20)]
single_scores = [model.fit(Xtr,ytr).score(Xte,yte) for Xtr,Xte,ytr,yte in scores]
print(f"Single-split range: {min(single_scores):.3f} to {max(single_scores):.3f}")

# 5-Fold CV: stable mean ± honest uncertainty
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv = cross_val_score(model, X, y, cv=skf, scoring="accuracy")
print(f"5-Fold CV: {cv.mean():.3f} ± {cv.std():.3f}")
print(f"Per-fold:  {cv.round(3)}")

Output

Single-split range: 0.885 to 0.940 5-Fold CV: 0.913 ± 0.014 Per-fold: [0.900 0.920 0.905 0.935 0.905]

EXAMPLE 3.2 Evaluation Metrics — The Full Picture

Python · Metrics + ROC Curve

from sklearn.metrics import classification_report, roc_auc_score, RocCurveDisplay
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt

X, y = make_classification(n_samples=5000, weights=[.89,.11], random_state=0)
X_tr,X_te,y_tr,y_te = train_test_split(X, y, test_size=.2, stratify=y, random_state=42)

naive = np.zeros_like(y_te)
print(f"Naive accuracy (predict all 'no'): {(naive==y_te).mean():.3f}")

clf = LogisticRegression(class_weight="balanced", max_iter=1000).fit(X_tr, y_tr)
print(classification_report(y_te, clf.predict(X_te)))
print(f"AUC-ROC: {roc_auc_score(y_te, clf.predict_proba(X_te)[:,1]):.3f}")

fig, ax = plt.subplots(figsize=(5,5))
RocCurveDisplay.from_estimator(clf, X_te, y_te, ax=ax)
ax.plot([0,1],[0,1],"k--",label="Random"); ax.legend(); plt.show()

EXAMPLE 3.3 Stratified K-Fold Inside a Pipeline — The Correct Way

Python · CV + Pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_validate, StratifiedKFold
import pandas as pd

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("clf",    LogisticRegression(class_weight="balanced", max_iter=1000)),
])

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
results = cross_validate(pipe, X, y, cv=skf,
                          scoring=["accuracy","f1","roc_auc"],
                          return_train_score=True)

summary = pd.DataFrame({
    m: {"mean":results[f"test_{m}"].mean(), "std":results[f"test_{m}"].std()}
    for m in ["accuracy","f1","roc_auc"]
}).T.round(3)
print(summary)

Critical rule — Data Leakage

Always run cross-validation inside a Pipeline. If you scale all data first and then CV, test folds can "see" training statistics — that is data leakage. The Pipeline refits the scaler only on each training fold automatically.

Live Simulation — K-Fold Visualiser

⚡ Adjust K to see how folds rotate through your data

Number of folds K5

Try it yourself

Compare cross_val_score(..., scoring="f1") vs scoring="accuracy" on imbalanced data. Which tells the truer story?
Try cv=10. Does the mean change much? Does the standard deviation go up or down?
Look up TimeSeriesSplit in sklearn docs. Why would you need this for weekly campaign data?

Week 3 Takeaway

Always report CV mean ± std, not a single test score. For imbalanced marketing data, AUC-ROC and F1 tell the truth that Accuracy hides. Always keep preprocessing inside a Pipeline when cross-validating.

Week 4 · Hyperparameter Tuning

Finding the Best Settings

Every ML model ships with default settings that are "good enough" — but not optimal for your data. Hyperparameters are knobs you control before training (like max_depth or learning_rate). Tuning is systematically finding better ones.

Three strategies: Grid Search — try every combination (thorough, slow). Random Search — try random combinations (80% of the benefit at 20% of the cost). Bayesian Optuna — use past results to guide the search intelligently.

EXAMPLE 4.1 Grid Search & Random Search

▶ Open in Colab

Python · GridSearch + RandomSearch

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import time

X, y = make_classification(n_samples=2000, n_features=15, random_state=0)
X_tr,X_te,y_tr,y_te = train_test_split(X, y, test_size=.2, random_state=42)

param_grid = {"max_depth":[3,5,8,12], "n_estimators":[50,100,200], "min_samples_leaf":[1,5,10]}

t0 = time.time()
gs = GridSearchCV(RandomForestClassifier(random_state=42), param_grid, cv=5, scoring="roc_auc", n_jobs=-1)
gs.fit(X_tr, y_tr)
print(f"Grid Search  ⏱ {time.time()-t0:.1f}s | Best AUC: {gs.best_score_:.4f}")
print("Best:", gs.best_params_)

t0 = time.time()
rs = RandomizedSearchCV(RandomForestClassifier(random_state=42), param_grid, n_iter=20, cv=5, scoring="roc_auc", random_state=42, n_jobs=-1)
rs.fit(X_tr, y_tr)
print(f"Random Search ⏱ {time.time()-t0:.1f}s | Best AUC: {rs.best_score_:.4f}")

EXAMPLE 4.2 Bayesian Tuning with Optuna

Python · Optuna

!pip install optuna --quiet
import optuna; optuna.logging.set_verbosity(optuna.logging.WARNING)

def objective(trial):
    m = RandomForestClassifier(
        n_estimators =trial.suggest_int("n_estimators",50,300),
        max_depth    =trial.suggest_int("max_depth",2,15),
        min_samples_leaf=trial.suggest_int("min_samples_leaf",1,20),
        random_state=42)
    return cross_val_score(m, X_tr, y_tr, cv=3, scoring="roc_auc").mean()

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=40, show_progress_bar=True)
print(f"Optuna best AUC: {study.best_value:.4f}")
print("Best params:", study.best_params)

Live Simulation — Parameter Search Heatmap

⚡ Simulated AUC across max_depth — highlighted row = your selection

max_depth focus8

Try it yourself

Time both searches. How much faster is Random Search for similar AUC?
Increase Optuna to n_trials=80. Does it keep improving or plateau?
Add max_features=trial.suggest_float("max_features",0.3,1.0) to the Optuna objective.

Week 4 Takeaway

Random Search beats Grid Search in time-per-quality almost always. Use Optuna when each training run is expensive — it is the same logic as running only your highest-ROI campaigns once you have learned which levers matter.

Week 5 · Pipelines & Feature Engineering

Garbage In, Garbage Out

Feature engineering is where marketing domain knowledge pays off most. A Pipeline bundles data preparation with your model so that the same transformations applied during training are automatically applied to new data at prediction time.

RFM — the original marketing ML feature: Recency, Frequency, Monetary — three dimensions that have predicted customer value since the 1980s. This week you build them from scratch from raw transaction logs.

EXAMPLE 5.1 Data Leakage Without Pipeline vs. The Correct Way

▶ Open in Colab

Python · Leakage demo

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
import numpy as np

np.random.seed(42)
X = np.random.randn(1000, 20)   # 20 pure noise features
y = np.random.randint(0, 2, 1000) # random labels → model should score ~50%

# ❌ WRONG: scale ALL data before splitting
X_sc = StandardScaler().fit_transform(X)   # leaks train stats into test!
X_tr,X_te,y_tr,y_te = train_test_split(X_sc, y, test_size=.2)
bad = LogisticRegression().fit(X_tr, y_tr)
print(f"Leaky score:   {bad.score(X_te, y_te):.3f}")

# ✅ RIGHT: scaler inside Pipeline — refitted per fold
pipe = Pipeline([("sc",StandardScaler()), ("clf",LogisticRegression())])
cv = cross_val_score(pipe, X, y, cv=5)
print(f"Pipeline CV:   {cv.mean():.3f} ± {cv.std():.3f}")  # ~0.50 — honest

EXAMPLE 5.2 ColumnTransformer — Mixed Data Types

Python · ColumnTransformer

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

df_m = pd.DataFrame({
    "age":    [25,34,None,45,28],
    "spend":  [200,450,120,None,310],
    "channel":["email","social","email","direct","social"],
    "churned":[0,0,1,0,1],
})
num = Pipeline([("imp",SimpleImputer(strategy="median")),("sc",StandardScaler())])
cat = Pipeline([("imp",SimpleImputer(strategy="most_frequent")),("ohe",OneHotEncoder(handle_unknown="ignore"))])

prep = ColumnTransformer([("num",num,["age","spend"]),("cat",cat,["channel"])])
full = Pipeline([("prep",prep),("clf",RandomForestClassifier())])
print(full)

EXAMPLE 5.3 Building RFM Features from Transaction Logs

Python · RFM Feature Engineering

import pandas as pd; import numpy as np
np.random.seed(7); n_tx = 2000

tx = pd.DataFrame({
    "customer_id": np.random.randint(1,401,n_tx),
    "date": pd.to_datetime("2025-01-01") + pd.to_timedelta(np.random.randint(0,365,n_tx),unit="D"),
    "amount": np.random.exponential(80, n_tx).round(2),
})

snap = tx["date"].max() + pd.Timedelta(days=1)
rfm = tx.groupby("customer_id").agg(
    Recency   = ("date",   lambda d: (snap-d.max()).days),
    Frequency = ("date",   "count"),
    Monetary  = ("amount", "sum"),
).reset_index()

for c in ["Frequency","Monetary"]:
    rfm[f"{c}_score"] = pd.qcut(rfm[c], q=5, labels=[1,2,3,4,5])
rfm["Recency_score"] = pd.qcut(rfm["Recency"], q=5, labels=[5,4,3,2,1])
rfm["RFM"] = rfm["Recency_score"].astype(int)+rfm["Frequency_score"].astype(int)+rfm["Monetary_score"].astype(int)
print(rfm.head())

Try it yourself

Add avg_order_value = Monetary / Frequency. Does it improve churn prediction?
Segment by RFM: Champions (≥13), At Risk (7–9), Lost (<7). How many customers in each?
Add tenure (days since first purchase) as a 4th feature. Does it predict churn?

Week 5 Takeaway

A Pipeline encoding RFM features, segment flags, and seasonality almost always outperforms a raw-feature model. Feature engineering is where your marketing expertise creates an unfair advantage over pure-data approaches.

Week 6 · Tree Models & Ensemble

From One Tree to a Forest

A single decision tree is interpretable but unstable. Ensemble methods combine many trees. Bagging (Random Forest) builds trees in parallel on random subsets. Boosting (XGBoost, LightGBM) builds trees sequentially, each correcting the previous one's mistakes.

The marketing analogy: A single sales forecast from one analyst can be badly wrong. Average forecasts from 500 independently-briefed analysts (Random Forest) and you get something far more reliable. Boosting is like running the analysis, then asking a second analyst to focus only on the cases the first got wrong.

EXAMPLE 6.1 Decision Tree — Interpretable but Fragile

▶ Open in Colab

Python · Decision Tree

from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt

dt = DecisionTreeClassifier(max_depth=4, random_state=42).fit(X_tr, y_tr)
print(f"Train: {dt.score(X_tr,y_tr):.3f}  Test: {dt.score(X_te,y_te):.3f}")

fig, ax = plt.subplots(figsize=(18,6))
plot_tree(dt, max_depth=3, filled=True, feature_names=[f"f{i}" for i in range(X_tr.shape[1])], ax=ax)
plt.tight_layout(); plt.show()

dt_overfit = DecisionTreeClassifier(random_state=42).fit(X_tr, y_tr)  # no depth limit
print(f"Overfit — Train: {dt_overfit.score(X_tr,y_tr):.3f}  Test: {dt_overfit.score(X_te,y_te):.3f}")

EXAMPLE 6.2 RF vs XGBoost vs LightGBM Head-to-Head

Python · Ensemble comparison

!pip install xgboost lightgbm --quiet
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import roc_auc_score
import time

models = {
    "Random Forest": RandomForestClassifier(n_estimators=100,random_state=42),
    "XGBoost":       XGBClassifier(n_estimators=100,random_state=42,eval_metric="auc",verbosity=0),
    "LightGBM":      LGBMClassifier(n_estimators=100,random_state=42,verbose=-1),
}
for name, m in models.items():
    t = time.time(); m.fit(X_tr, y_tr)
    auc = roc_auc_score(y_te, m.predict_proba(X_te)[:,1])
    print(f"{name:15s}  AUC={auc:.4f}  Time={time.time()-t:.2f}s")

Live Simulation — OOB Error vs. Number of Trees

⚡ More trees = more stable — watch the error floor flatten out

n_estimators50

When to use which

Random Forest: great default, robust to hyperparameters. XGBoost: Kaggle workhorse, very accurate with tuning. LightGBM: fastest on large datasets. Start with LightGBM for datasets over 100k rows.

Try it yourself

Print feature importances: pd.Series(rf.feature_importances_).sort_values(ascending=False).head(10)
Try XGBClassifier(scale_pos_weight=8) for class imbalance. Does AUC improve?
Build a simple stack: use RF and XGB predictions as features for a Logistic Regression meta-model.

Week 6 Takeaway

LightGBM is your default starting point for tabular marketing data — fast, accurate, handles missing values natively. Reserve XGBoost for when you need maximum accuracy and have time to tune.

Week 7 · AutoML

Let the Machine Tune Itself

AutoML automates model selection, feature engineering, and hyperparameter tuning. It does not replace you — it handles repetitive search work so you can focus on defining the right problem and interpreting results for business action.

The critical mindset: A model with 0.94 AUC on a poorly-framed problem will still fail in production. Your marketing domain expertise defines the question. AutoML searches the answer space.

EXAMPLE 7.1 FLAML — 3 Lines to a Trained Model

▶ Open in Colab

Python · FLAML

!pip install flaml --quiet
from flaml import AutoML
from sklearn.metrics import roc_auc_score

automl = AutoML()
automl.fit(X_tr, y_tr, task="classification", metric="roc_auc", time_budget=60)
print(f"Best model:  {automl.best_estimator}")
print(f"Best config: {automl.best_config}")
print(f"Test AUC:    {roc_auc_score(y_te, automl.predict_proba(X_te)[:,1]):.4f}")

Free tier tip

FLAML works within Colab's free tier. Set time_budget=60 for quick experiments and time_budget=300 for production-quality results. No GPU needed for tabular marketing data.

EXAMPLE 7.2 AutoGluon — Model Leaderboard

Python · AutoGluon

# ⚠ AutoGluon is large (~1GB). Recommended: use Kaggle Notebooks (free, 30GB RAM)
!pip install autogluon.tabular --quiet
from autogluon.tabular import TabularPredictor
import pandas as pd

train_df = pd.DataFrame(X_tr); train_df["target"] = y_tr.values
predictor = TabularPredictor(label="target", eval_metric="roc_auc")
predictor.fit(train_df, time_limit=120, presets="medium_quality")

test_df = pd.DataFrame(X_te)
lb = predictor.leaderboard(test_df.assign(target=y_te), silent=True)
print(lb[["model","score_test","fit_time"]].head(8))

Try it yourself

Change FLAML's metric to "f1". Does it select a different best model?
Compare FLAML AUC vs. your best manually-tuned model from Week 4.
Read automl.best_config. Can you see the hyperparameters AutoML discovered?

Week 7 Takeaway

AutoML beats a default sklearn model nearly every time, and gets you 90% of an expert's result in 5% of the time. For marketing pilots and quick proof-of-concepts, that is the right trade-off.

Week 8 · Deployment

From Notebook to Live App

A model that lives only in a Colab notebook has zero business value. Gradio turns a Python function into a web app in minutes. Hugging Face Spaces hosts it for free with a shareable link — no servers, no DevOps.

The restaurant analogy: Training a model = developing the recipe. Deployment = opening the restaurant. The best recipe in the world has no revenue until customers can actually order the dish.

EXAMPLE 8.1 Local Gradio Demo in 10 Lines

▶ Open in Colab

Python · Gradio demo

!pip install gradio --quiet
import gradio as gr; import joblib; import numpy as np

joblib.dump(automl, "churn_model.pkl")
model = joblib.load("churn_model.pkl")

def predict_churn(recency, frequency, monetary, tenure):
    prob = model.predict_proba(np.array([[recency,frequency,monetary,tenure]]))[0,1]
    label = "🔴 High Risk" if prob > .5 else "🟢 Low Risk"
    return {label: float(prob), "Stay": 1-float(prob)}

gr.Interface(
    fn=predict_churn,
    inputs=[gr.Slider(0,365,label="Days Since Last Purchase"),
            gr.Slider(1,50,label="Purchase Frequency"),
            gr.Slider(0,5000,label="Total Spend (AED)"),
            gr.Slider(0,1000,label="Account Age (days)")],
    outputs=gr.Label(label="Churn Probability"),
    title="🎯 Customer Churn Predictor",
).launch(share=True)

EXAMPLE 8.2 Full app.py for Hugging Face Spaces

app.py

# Upload this + model.pkl + requirements.txt to your HF Space
import gradio as gr
import joblib, pandas as pd, numpy as np

model = joblib.load("model.pkl")
FEATURES = ["recency","frequency","monetary","tenure_days"]

def predict(*args):
    df = pd.DataFrame([dict(zip(FEATURES, args))])
    prob = model.predict_proba(df)[0,1]
    risk = "🔴 High" if prob>.6 else ("🟡 Medium" if prob>.3 else "🟢 Low")
    return f"{risk} churn risk — {prob:.1%}"

gr.Interface(fn=predict,
    inputs=[gr.Number(label="Recency (days)"), gr.Number(label="Frequency"),
            gr.Number(label="Monetary (AED)"), gr.Number(label="Tenure (days)")],
    outputs="text", title="HCT ML in Marketing — Churn Predictor"
).launch()

HF Spaces deployment steps

1. Create an account at huggingface.co → 2. New Space → Gradio SDK → 3. Upload app.py, model.pkl, requirements.txt → 4. HF builds and hosts your app automatically. Shareable URL: your-name-space-name.hf.space

Try it yourself

Add a CSV upload with gr.File() for bulk predictions.
Push your files to a GitHub repo and enable GitHub sync in HF Spaces settings.
Share your Space URL with a classmate. Can they get a prediction with zero coding?

Week 8 Takeaway

Handing a non-technical stakeholder a URL and saying "just upload your data here" converts ML from a data science project into a business tool — and that conversation is what justifies the investment.

Week 9 · Full Project

Real Kaggle Dataset: End-to-End

This week you put everything together on a real-world marketing dataset. This is your course capstone: EDA → feature engineering → Pipeline → cross-validation → AutoML → deployed Gradio app.

Recommended datasets (all free on Kaggle): Telco Customer Churn · Bank Marketing Response · E-Commerce Shipping · Online Retail II. Pick the one closest to your intended industry.

EXAMPLE 9.1 Full EDA Template

▶ Open in Colab

Python · EDA

import pandas as pd; import seaborn as sns; import matplotlib.pyplot as plt

df = pd.read_csv("your_dataset.csv")
print("Shape:", df.shape)
print((df.isnull().mean() * 100).round(1).sort_values(ascending=False).head(10))
print(df["target"].value_counts(normalize=True))

fig, ax = plt.subplots(figsize=(10,7))
sns.heatmap(df.select_dtypes("number").corr(), cmap="coolwarm", center=0, ax=ax, annot=True, fmt=".1f")
plt.tight_layout(); plt.show()

EXAMPLE 9.2 Pipeline + FLAML + SHAP Explainability

Python · Full workflow

!pip install flaml shap --quiet
from flaml import AutoML
from sklearn.model_selection import StratifiedKFold, cross_val_score
import shap

automl = AutoML()
automl.fit(X_tr, y_tr, task="classification", metric="roc_auc", time_budget=120)

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_auc = cross_val_score(automl.model.estimator, X, y, cv=cv, scoring="roc_auc")
print(f"CV AUC: {cv_auc.mean():.4f} ± {cv_auc.std():.4f}")

explainer = shap.TreeExplainer(automl.model.estimator)
shap_vals = explainer.shap_values(X_te)
shap.summary_plot(shap_vals, X_te, plot_type="bar")

Deliverable checklist

① EDA notebook (≥5 visualisations) · ② ML pipeline with cross-validated AUC · ③ SHAP feature importance plot · ④ Live Gradio app on HF Spaces · ⑤ GitHub repo with README explaining the business problem

Recommended GitHub structure

ml-marketing-project/
├── data/           # README with Kaggle link only
├── notebooks/
│   ├── 01_eda.ipynb
│   └── 02_modelling.ipynb
├── app.py          # Gradio app
├── model.pkl
├── requirements.txt
└── README.md       # business problem + HF Spaces link

Week 9 Takeaway

A complete project — EDA + model + CV evaluation + deployed app + GitHub repo — is the deliverable that goes in your portfolio. It shows you can work end-to-end, not just run individual cells.

Reference · Appendix A

Cheatsheet & Debugging Guide

sklearn Quick Reference

Operation	Code	When to use
Split data	`train_test_split(X,y,test_size=.2,stratify=y)`	Always stratify for classification
Cross-validate	`cross_val_score(pipe,X,y,cv=StratifiedKFold(5))`	For reliable metric estimation
Build pipeline	`Pipeline([("sc",Scaler()),("clf",Model())])`	Any time you scale or encode
Handle missing	`SimpleImputer(strategy="median")`	Numeric columns with NaN
Encode categories	`OneHotEncoder(handle_unknown="ignore")`	Nominal categories (<20 values)
Grid search	`GridSearchCV(model,params,cv=5,n_jobs=-1)`	<200 total combinations
Random search	`RandomizedSearchCV(...,n_iter=30)`	Large parameter spaces
Save model	`joblib.dump(model,"model.pkl")`	Before deployment
Load model	`model=joblib.load("model.pkl")`	In app.py / at prediction time

Common Errors & Fixes

Error	Cause	Fix
`ValueError: could not convert string`	Categorical column not encoded	Add `OneHotEncoder` in ColumnTransformer
`KeyError: "column"`	Typo or wrong dataset loaded	Check `df.columns.tolist()`
`DataConversionWarning`	Mixed dtypes in array	`df[col]=df[col].astype(float)`
`MemoryError`	Dataset too large for free Colab	`df=df.sample(50000)` or use Kaggle Notebooks
`ModuleNotFoundError`	Library not installed	Run `!pip install library_name`
Train AUC=1.0, Test AUC=0.6	Overfitting	Reduce `max_depth`, add regularisation
AUC≈0.5 after AutoML	Data leakage or wrong label	Check if label-derived features are in X

Free Tools & Resources

Compute: Google Colab (colab.research.google.com) · Kaggle Notebooks (kaggle.com/code) — both free, no installation
Models & Data: Hugging Face (huggingface.co) · Kaggle Datasets (kaggle.com/datasets)
Deployment: Hugging Face Spaces (free Gradio SDK) · GitHub (free public repos)
Free APIs: HF Inference API · Cohere Trial Key · Google AI Studio (aistudio.google.com)
Docs: scikit-learn.org/stable · flaml.ai/docs · optuna.readthedocs.io · gradio.app/docs

Learning tip

After reading each week's material, open a blank Colab notebook and retype one example from memory — looking back only when stuck. 30 minutes of effortful recall beats 3 hours of passive reading.

Final word

You have now seen the complete modern ML stack — from raw data to a live deployed application. The tools are free. The knowledge is in your hands. The only remaining variable is practice.

Digital Marketing,Marketing Analytics & AI+

Why ML for Marketing?

EXAMPLE 1.1 Setting Up Your Colab Environment

EXAMPLE 1.2 Loading a Real Marketing Dataset

EXAMPLE 1.3 Your First Exploratory Data Analysis

Try it yourself

sklearn Blocks & Your First Model

EXAMPLE 2.1 Train/Test Split + Linear Regression

EXAMPLE 2.2 Classification — Predicting Purchase Intent

Live Simulation — Train/Test Split Visualiser

Try it yourself

Are We Actually Good?

EXAMPLE 3.1 Why a Single Split Lies — K-Fold Solution

EXAMPLE 3.2 Evaluation Metrics — The Full Picture

EXAMPLE 3.3 Stratified K-Fold Inside a Pipeline — The Correct Way

Live Simulation — K-Fold Visualiser

Try it yourself

Finding the Best Settings

EXAMPLE 4.1 Grid Search & Random Search

EXAMPLE 4.2 Bayesian Tuning with Optuna

Live Simulation — Parameter Search Heatmap

Try it yourself

Garbage In, Garbage Out

EXAMPLE 5.1 Data Leakage Without Pipeline vs. The Correct Way

EXAMPLE 5.2 ColumnTransformer — Mixed Data Types

EXAMPLE 5.3 Building RFM Features from Transaction Logs

Try it yourself

From One Tree to a Forest

EXAMPLE 6.1 Decision Tree — Interpretable but Fragile

EXAMPLE 6.2 RF vs XGBoost vs LightGBM Head-to-Head

Live Simulation — OOB Error vs. Number of Trees

Try it yourself

Let the Machine Tune Itself

EXAMPLE 7.1 FLAML — 3 Lines to a Trained Model

EXAMPLE 7.2 AutoGluon — Model Leaderboard

Try it yourself

From Notebook to Live App

EXAMPLE 8.1 Local Gradio Demo in 10 Lines

EXAMPLE 8.2 Full app.py for Hugging Face Spaces

Try it yourself

Real Kaggle Dataset: End-to-End

EXAMPLE 9.1 Full EDA Template

EXAMPLE 9.2 Pipeline + FLAML + SHAP Explainability

Cheatsheet & Debugging Guide

sklearn Quick Reference

Common Errors & Fixes

Free Tools & Resources

Digital Marketing,
Marketing Analytics & AI+