diff --git a/README.md b/README.md index 67be634..10ab82f 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram signals that contain binary classes and labeled as good(0) and bad(1) signals. -### STEP 1: Fill missing values +## STEP 1: Fill missing values All the columns in our data contain missing values a range from 25 to 70. By using `from sklearn.impute import KNNImputer` @@ -16,7 +16,7 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram return data_frame_imputed ``` -### STEP 2: Scaling +## STEP 2: Scaling We used `from sklearn.preprocessing import RobustScaler` to handle scaling. @@ -28,7 +28,7 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram data_frame_scaled["label"] = labels.values ``` -### STEP 3: k-fold cross validation + stratify classes + balancing training data +## STEP 3: k-fold cross validation + stratify classes + balancing training data First of all we split the dataset into 2 parts train (85%) and test (15%). For making sure that majority class and imbalanced class distributed fairly we passed `stratify=y` @@ -79,16 +79,18 @@ We are dealing with an exteremly imbalance dataset related to electrocardiogram model.fit(X_train, y_train) ``` -### STEP 4: Train different models to find the best possible approach +## STEP 4: Train different models to find the best possible approach -What we are looking for: -Dangerous: Sick → predicted healthy : high recall score or low FN -Costly: Healthy → predicted sick : high precision score or low FP +#### What we are looking for: + +#### Dangerous: Sick → predicted healthy : high recall score or low FN + +#### Costly: Healthy → predicted sick : high precision score or low FP +## next steps: ``` -next steps: ✅ 1. Stratified K-fold only apply on train. 🗹 2. train LGBM model using KMEANS_SMOTE with k_neighbors=10 🗹 3. train Cat_boost using KMEANS_SMOTE with k_neighbors=10