editing readme.md

This commit is contained in:
2025-11-30 23:45:59 +01:00
parent 3ffa2524a3
commit 036f107a59

View File

@@ -2,7 +2,7 @@
We are dealing with an exteremly imbalance dataset related to electrocardiogram signals that contain binary classes and labeled as good(0) and bad(1) signals.
### STEP 1: Fill missing values
## STEP 1: Fill missing values
All the columns in our data contain missing values a range from 25 to 70. By using `from sklearn.impute import KNNImputer`
@@ -16,7 +16,7 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram
return data_frame_imputed
```
### STEP 2: Scaling
## STEP 2: Scaling
We used `from sklearn.preprocessing import RobustScaler` to handle scaling.
@@ -28,7 +28,7 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram
data_frame_scaled["label"] = labels.values
```
### STEP 3: k-fold cross validation + stratify classes + balancing training data
## STEP 3: k-fold cross validation + stratify classes + balancing training data
First of all we split the dataset into 2 parts train (85%) and test (15%). For making sure that majority class and imbalanced class
distributed fairly we passed `stratify=y`
@@ -79,16 +79,18 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram
model.fit(X_train, y_train)
```
### STEP 4: Train different models to find the best possible approach
## STEP 4: Train different models to find the best possible approach
What we are looking for:
Dangerous: Sick → predicted healthy : high recall score or low FN
Costly: Healthy → predicted sick : high precision score or low FP
#### What we are looking for:
#### Dangerous: Sick → predicted healthy : high recall score or low FN
#### Costly: Healthy → predicted sick : high precision score or low FP
## next steps:
```
next steps:
✅ 1. Stratified K-fold only apply on train.
🗹 2. train LGBM model using KMEANS_SMOTE with k_neighbors=10
🗹 3. train Cat_boost using KMEANS_SMOTE with k_neighbors=10