{ "cells": [ { "cell_type": "markdown", "id": "b9181d5d", "metadata": { "papermill": { "duration": 0.022276, "end_time": "2022-08-28T13:50:03.958617", "exception": false, "start_time": "2022-08-28T13:50:03.936341", "status": "completed" }, "tags": [] }, "source": [ "# Section of the Notebook" ] }, { "cell_type": "markdown", "id": "72dd993b", "metadata": { "papermill": { "duration": 0.021173, "end_time": "2022-08-28T13:50:04.002367", "exception": false, "start_time": "2022-08-28T13:50:03.981194", "status": "completed" }, "tags": [] }, "source": [ "1) **Choice of Metrics** \n", "2) **Library import & Dataset** \n", "3.1) **Feature generation** \n", "3.2) **Data cleaning** \n", "3.3) **Data scaling (Normalization | Standardization)** \n", "4) **Dimensionality reduction** \n", "5) **Clustering** \n", "6) **Business Recommendations** " ] }, { "cell_type": "markdown", "id": "25ec315a", "metadata": { "papermill": { "duration": 0.021734, "end_time": "2022-08-28T13:50:04.045639", "exception": false, "start_time": "2022-08-28T13:50:04.023905", "status": "completed" }, "tags": [] }, "source": [ "## Selection of Metrics" ] }, { "cell_type": "markdown", "id": "99948a3c", "metadata": { "papermill": { "duration": 0.021363, "end_time": "2022-08-28T13:50:04.088385", "exception": false, "start_time": "2022-08-28T13:50:04.067022", "status": "completed" }, "tags": [] }, "source": [ "There are various metrics for clustering, for example:\n", "* Silhouette coefficient \n", "* Calinski-Harabasz Index \n", "* Davies-Bouldin Index" ] }, { "cell_type": "markdown", "id": "900f1f40", "metadata": { "papermill": { "duration": 0.021263, "end_time": "2022-08-28T13:50:04.131480", "exception": false, "start_time": "2022-08-28T13:50:04.110217", "status": "completed" }, "tags": [] }, "source": [ "\n" ] }, { "cell_type": "markdown", "id": "2f870b78", "metadata": { "papermill": { "duration": 0.021887, "end_time": "2022-08-28T13:50:04.174852", "exception": false, "start_time": "2022-08-28T13:50:04.152965", "status": "completed" }, "tags": [] }, "source": [ "## Importing Libraries and Dataset" ] }, { "cell_type": "code", "execution_count": 1, "id": "1fbdee6d", "metadata": { "execution": { "iopub.execute_input": "2022-08-28T13:50:04.222093Z", "iopub.status.busy": "2022-08-28T13:50:04.220859Z", "iopub.status.idle": "2022-08-28T13:50:07.209795Z", "shell.execute_reply": "2022-08-28T13:50:07.208214Z" }, "papermill": { "duration": 3.016161, "end_time": "2022-08-28T13:50:07.212927", "exception": false, "start_time": "2022-08-28T13:50:04.196766", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd \n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import plotly.express as px\n", "import sklearn\n", "from sklearn.preprocessing import MinMaxScaler\n", "from sklearn.metrics import silhouette_score\n", "from sklearn.metrics import silhouette_samples\n", "from scipy.cluster.hierarchy import dendrogram\n", "from sklearn.decomposition import PCA\n", "from sklearn.cluster import KMeans, AgglomerativeClustering, SpectralClustering, DBSCAN\n", "from datetime import datetime\n", "import warnings\n", "import sys\n", "import os" ] }, { "cell_type": "code", "execution_count": 2, "id": "db39198d", "metadata": { "execution": { "iopub.execute_input": "2022-08-28T13:50:07.260113Z", "iopub.status.busy": "2022-08-28T13:50:07.259192Z", "iopub.status.idle": "2022-08-28T13:50:07.297684Z", "shell.execute_reply": "2022-08-28T13:50:07.296601Z" }, "papermill": { "duration": 0.065015, "end_time": "2022-08-28T13:50:07.300452", "exception": false, "start_time": "2022-08-28T13:50:07.235437", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "df = pd.read_csv('marketing_campaign.csv')" ] }, { "cell_type": "code", "execution_count": 3, "id": "d4a47db8", "metadata": { "execution": { "iopub.execute_input": "2022-08-28T13:50:07.347340Z", "iopub.status.busy": "2022-08-28T13:50:07.346437Z", "iopub.status.idle": "2022-08-28T13:50:07.387384Z", "shell.execute_reply": "2022-08-28T13:50:07.385894Z" }, "papermill": { "duration": 0.068972, "end_time": "2022-08-28T13:50:07.391082", "exception": false, "start_time": "2022-08-28T13:50:07.322110", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | ID | \n", "Year_Birth | \n", "Education | \n", "Marital_Status | \n", "Income | \n", "Kidhome | \n", "Teenhome | \n", "Dt_Customer | \n", "Recency | \n", "MntWines | \n", "... | \n", "NumWebVisitsMonth | \n", "AcceptedCmp3 | \n", "AcceptedCmp4 | \n", "AcceptedCmp5 | \n", "AcceptedCmp1 | \n", "AcceptedCmp2 | \n", "Complain | \n", "Z_CostContact | \n", "Z_Revenue | \n", "Response | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "5524 | \n", "1957 | \n", "Graduation | \n", "Single | \n", "58138.0 | \n", "0 | \n", "0 | \n", "4/9/2012 | \n", "58 | \n", "635 | \n", "... | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "1 | \n", "
1 | \n", "2174 | \n", "1954 | \n", "Graduation | \n", "Single | \n", "46344.0 | \n", "1 | \n", "1 | \n", "8/3/2014 | \n", "38 | \n", "11 | \n", "... | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
2 | \n", "4141 | \n", "1965 | \n", "Graduation | \n", "Together | \n", "71613.0 | \n", "0 | \n", "0 | \n", "21-08-2013 | \n", "26 | \n", "426 | \n", "... | \n", "4 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
3 | \n", "6182 | \n", "1984 | \n", "Graduation | \n", "Together | \n", "26646.0 | \n", "1 | \n", "0 | \n", "10/2/2014 | \n", "26 | \n", "11 | \n", "... | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
4 | \n", "5324 | \n", "1981 | \n", "PhD | \n", "Married | \n", "58293.0 | \n", "1 | \n", "0 | \n", "19-01-2014 | \n", "94 | \n", "173 | \n", "... | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
95 | \n", "7516 | \n", "1983 | \n", "Graduation | \n", "Married | \n", "30096.0 | \n", "1 | \n", "0 | \n", "22-05-2014 | \n", "30 | \n", "5 | \n", "... | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
96 | \n", "7247 | \n", "1960 | \n", "Graduation | \n", "Widow | \n", "47916.0 | \n", "0 | \n", "1 | \n", "22-11-2012 | \n", "72 | \n", "505 | \n", "... | \n", "6 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
97 | \n", "11100 | \n", "1972 | \n", "Graduation | \n", "Divorced | \n", "51813.0 | \n", "1 | \n", "1 | \n", "11/4/2013 | \n", "37 | \n", "51 | \n", "... | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
98 | \n", "4646 | \n", "1951 | \n", "2n Cycle | \n", "Married | \n", "78497.0 | \n", "0 | \n", "0 | \n", "1/12/2013 | \n", "44 | \n", "207 | \n", "... | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
99 | \n", "3037 | \n", "1983 | \n", "PhD | \n", "Married | \n", "50150.0 | \n", "0 | \n", "0 | \n", "20-06-2013 | \n", "32 | \n", "135 | \n", "... | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
100 rows × 29 columns
\n", "