editing readme.md

This commit is contained in:
2025-11-30 23:45:59 +01:00
parent 3ffa2524a3
commit 036f107a59

View File

@@ -2,7 +2,7 @@
We are dealing with an exteremly imbalance dataset related to electrocardiogram signals that contain binary classes and labeled as good(0) and bad(1) signals. We are dealing with an exteremly imbalance dataset related to electrocardiogram signals that contain binary classes and labeled as good(0) and bad(1) signals.
### STEP 1: Fill missing values ## STEP 1: Fill missing values
All the columns in our data contain missing values a range from 25 to 70. By using `from sklearn.impute import KNNImputer` All the columns in our data contain missing values a range from 25 to 70. By using `from sklearn.impute import KNNImputer`
@@ -16,7 +16,7 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram
return data_frame_imputed return data_frame_imputed
``` ```
### STEP 2: Scaling ## STEP 2: Scaling
We used `from sklearn.preprocessing import RobustScaler` to handle scaling. We used `from sklearn.preprocessing import RobustScaler` to handle scaling.
@@ -28,7 +28,7 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram
data_frame_scaled["label"] = labels.values data_frame_scaled["label"] = labels.values
``` ```
### STEP 3: k-fold cross validation + stratify classes + balancing training data ## STEP 3: k-fold cross validation + stratify classes + balancing training data
First of all we split the dataset into 2 parts train (85%) and test (15%). For making sure that majority class and imbalanced class First of all we split the dataset into 2 parts train (85%) and test (15%). For making sure that majority class and imbalanced class
distributed fairly we passed `stratify=y` distributed fairly we passed `stratify=y`
@@ -79,16 +79,18 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram
model.fit(X_train, y_train) model.fit(X_train, y_train)
``` ```
### STEP 4: Train different models to find the best possible approach ## STEP 4: Train different models to find the best possible approach
What we are looking for: #### What we are looking for:
Dangerous: Sick → predicted healthy : high recall score or low FN
Costly: Healthy → predicted sick : high precision score or low FP #### Dangerous: Sick → predicted healthy : high recall score or low FN
#### Costly: Healthy → predicted sick : high precision score or low FP
## next steps:
``` ```
next steps:
✅ 1. Stratified K-fold only apply on train. ✅ 1. Stratified K-fold only apply on train.
🗹 2. train LGBM model using KMEANS_SMOTE with k_neighbors=10 🗹 2. train LGBM model using KMEANS_SMOTE with k_neighbors=10
🗹 3. train Cat_boost using KMEANS_SMOTE with k_neighbors=10 🗹 3. train Cat_boost using KMEANS_SMOTE with k_neighbors=10