comapre two models

2025-12-06 20:43:56 +01:00
parent fe6b805b31
commit 80ea363123
24 changed files with 1671 additions and 281 deletions
--- a/README.md
+++ b/README.md
@@ -101,11 +101,67 @@ Current results taken KMEANS_SMOTE:
 | LGBM_KMEANS_SMOTE_knn10     | test  | 0.9865689865689866 | 0.8543196878009516 | 0.8121616449258658 | 0.7895809912158687 | 0.9600745182511498 | 0.9931221342225928 | 0.7155172413793104 | 0.9964866786565728 | 0.6278366111951589 | 0.9987424020121568 | 0.5804195804195804 | 0.9875647668393782 | 0.9325842696629213 | 83  | 4765  | 6  | 60 |


+
+## Tuning LightGBM and CatBoost 
+
+As it is written in `models/catboost_model.py` tune function for this model we used the following parameters: 
+
+```
+  scaling_methods = [
+      "standard_scaling",
+      "robust_scaling",
+      "minmax_scaling",
+      "yeo_johnson",
+  ]
+  sampling_methods = [
+      "KMeansSMOTE",
+      "class_weight",
+  ]
+  learning_rate_list = [0.03, 0.05, 0.1]
+  depth_list = [6, 8]
+  l2_leaf_reg_list = [1, 3]
+  subsample_list = [0.8, 1.0]
+  k_neighbors_list = [10]
+  kmeans_estimator_list = [5]
+
+```
+Also, for `models/lightgbm_model.py` tune function we used the folowing parameters: 
+
+```
+  scaling_methods = [
+      "standard_scaling",
+      "robust_scaling",
+      "minmax_scaling",
+      "yeo_johnson",
+  ]
+  sampling_methods = [
+      "KMeansSMOTE",
+      "class_weight",
+  ]
+  boosting_type_list = ["gbdt", "dart"]
+  learning_rate_list = [0.03, 0.05, 0.1]
+  number_of_leaves_list = [100]
+  l2_regularization_lambda_list = [0.1, 0.5]
+  l1_regularization_alpha_list = [0.1, 0.5]
+  tree_subsample_tree_list = [0.8, 1.0]
+  subsample_list = [0.8, 1.0]
+  kmeans_smote_k_neighbors_list = [10]
+  kmeans_smote_n_clusters_list = [5]
+```
+After tuning we train both models based on their best parameters and compare on an imbalanced test data. 
+here is the comparison results: 
+| model    | accuracy           | f1_macro          | f2_macro          | recall_macro      | precision_macro   | f1_class0         | f2_class0         | recall_class0     | precision_class0  | f1_class1         | f2_class1         | recall_class1     | precision_class1  | TP | TN   | FP | FN |
+|----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|----|------|----|----|
+| catboost | 0.9814814814814815 | 0.8195693865042805 | 0.8013174756506312 | 0.7903526990720451 | 0.8559205703525894 | 0.9904901243599122 | 0.9921698350221925 | 0.9932928107315029 | 0.9877032096706961 | 0.6486486486486487 | 0.6104651162790697 | 0.5874125874125874 | 0.7241379310344828 | 84 | 4739 | 32 | 59 |
+| lightgbm | 0.9849409849409849 | 0.8469442386692707 | 0.8185917013944679 | 0.8023094072140393 | 0.9084632979829487 | 0.9922755741127348 | 0.9946427824048885 | 0.9962272060364703 | 0.9883551673944687 | 0.7016129032258065 | 0.6425406203840472 | 0.6083916083916084 | 0.8285714285714286 | 87 | 4753 | 18 | 56 |
+
+
+
 ## next steps: 
 ```
 ✅ 1. Stratified K-fold only apply on train.
 ✅ 2. train LGBM model using KMEANS_SMOTE with knn k_neighbors=10 (fine-tune remained)
-🗹 3. train Cat_boost using KMEANS_SMOTE with knn k_neighbors=10 (fine-tune remained)
+✅ 3. train Cat_boost using KMEANS_SMOTE with knn k_neighbors=10 (fine-tune remained)
 🗹 4. implement proposed methods of this article : https://1drv.ms/b/c/ab2a38fe5c318317/IQBEDsSFcYj6R6AMtOnh0X6DAZUlFqAYq19WT8nTeXomFwg
 🗹 5. compare proposed model with SMOTE vs oversampling balancing method
 ```