Credit Card Default Prediction Using Machine Learning
Tech Stack: Python, pandas, scikit-learn, Logistic Regression, XGBoost, Gradient Boosting, Random Forest
This project focuses on predicting customers likely to default on credit card payments by leveraging machine learning classification techniques. It aims to help financial institutions manage credit risk more effectively.
Problem Statement
Credit card defaults can cause significant financial losses for banks and lenders. Accurately predicting default risk enables proactive management, reducing losses and improving customer engagement strategies.
Overview
- Goal: Predict credit card payment defaults to reduce financial risk
- Data: Customer credit history, demographic, and payment behavior features
- Models: Logistic Regression, Decision Trees, Random Forest, XGBoost, Gradient Boosting
- Best Performance: Random Forest with highest ROC AUC score
Approach
Data Cleaning & Preprocessing: Handled missing values, encoded categorical variables, and scaled features.
Balancing Dataset: Applied SMOTE to address class imbalance and improve model robustness.
Modeling: Built and tuned five classification models, evaluated with ROC AUC, precision, recall, and F1-score.
Evaluation: Selected Random Forest based on ROC AUC; analyzed confusion matrices and feature importance.
Highlights
- Implemented SMOTE to overcome class imbalance in credit default data
- Compared multiple classification algorithms to find best predictive performance
- Enhanced credit risk management and optimized customer collection strategies