first commit

2025-11-08 21:37:52 +01:00
commit 489da84586
1 changed files with 183 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,183 @@
 # Image and Video Understanding Project
 A comprehensive project comparing multiple state-of-the-art deep learning models for object detection and instance segmentation on a waste/litter detection dataset.
 ## Overview
 This project evaluates and compares different deep learning architectures for instance segmentation on a custom waste detection dataset. Each model is trained and evaluated on the same dataset to enable fair comparison.
 ## Models
 ### 1. YOLO (YOLOv8l-seg)
 - **Model**: YOLOv8 Large Segmentation
 - **Framework**: Ultralytics
 - **Parameters**: 45.9M
 - **Training**: 200 epochs, batch size 16, image size 960x960
 - **Features**: Real-time inference, bounding box + mask prediction
 ### 2. Mask R-CNN
 - **Backbone**: ResNet-101 with FPN
 - **Framework**: Detectron2
 - **Training**: 1000-3000 iterations, batch size 8, image size 960x960
 - **Features**: Instance segmentation with high accuracy
 ### 3. Mask2Former
 - **Architecture**: Transformer-based segmentation
 - **Framework**: Detectron2
 - **Features**: Unified framework for semantic, instance, and panoptic segmentation
 ### 4. DETR
 - **Status**: Dataset prepared (implementation in progress)
 ## Dataset
 Custom waste/litter detection dataset with **20 classes**:
 - Clear plastic bottle, Glass bottle, Plastic bottle cap, Metal bottle cap
 - Broken glass, Drink can, Other carton, Corrugated carton
 - Paper cup, Disposable plastic cup, Plastic lid, Other plastic
 - Normal paper, Plastic film, Other plastic wrapper, Pop tab
 - Plastic straw, Styrofoam piece, Unlabeled litter, Cigarette
 **Dataset Structure**: Train/Val/Test splits in COCO format
 ## Project Structure
 ```
 ├── YOLO/
 │   ├── main.ipynb                    # Training and evaluation notebook
 │   ├── results/
 │   │   ├── train_200_960_16/        # Training outputs
 │   │   └── evaluation_200_960_16/   # Evaluation results
 │   └── dataset/                      # Dataset configuration
 ├── MRCNN/
 │   ├── main.ipynb                    # Training and evaluation notebook
 │   ├── results/
 │   │   ├── train_1000_iter/         # Training outputs
 │   │   └── eval/                     # Evaluation metrics
 │   └── requirements.txt
 ├── M2FORMER/
 │   ├── main.ipynb                    # Training and evaluation notebook
 │   ├── output/                       # Training outputs
 │   ├── Mask2Former/                  # Mask2Former repository
 │   └── requirements.txt
 └── DETR/
    └── dataset/                      # Image data
 ```
 ## Setup
 ### Prerequisites
 - Python 3.8+
 - PyTorch (with CUDA support recommended)
 - GPU recommended for training
 ### Installation
 #### YOLO
 ```bash
 pip install ultralytics
 ```
 #### Mask R-CNN
 ```bash
 pip install -r MRCNN/requirements.txt
 pip install 'git+https://github.com/facebookresearch/detectron2.git'
 ```
 #### Mask2Former
 ```bash
 pip install -r M2FORMER/requirements.txt
 pip install 'git+https://github.com/facebookresearch/detectron2.git'
 git clone https://github.com/facebookresearch/Mask2Former.git
 cd Mask2Former/mask2former/modeling/pixel_decoder/ops/
 ./make.sh  # Compile CUDA operations
 ```
 ## Usage
 ### Training
 Each model has a Jupyter notebook (`main.ipynb`) with complete training pipelines:
 1. **YOLO**: Open `YOLO/main.ipynb`
   - Configure dataset path in `data.yaml`
   - Run training cells
   - Model saves checkpoints every 10 epochs
 2. **Mask R-CNN**: Open `MRCNN/main.ipynb`
   - Configure dataset paths and parameters
   - Register COCO format datasets
   - Train and evaluate
 3. **Mask2Former**: Open `M2FORMER/main.ipynb`
   - Setup Mask2Former repository
   - Configure training parameters
   - Train and evaluate
 ### Evaluation
 All notebooks include:
 - COCO-style evaluation metrics
 - Confusion matrix generation
 - Prediction visualization
 - Performance comparison tools
 ## Results
 ### YOLO Results
 - **Box mAP50**: 26.9%
 - **Box mAP50-95**: 20.7%
 - **Mask mAP50**: 26.7%
 - **Mask mAP50-95**: 19.5%
 - **Precision (Box)**: 28.8%
 - **Recall (Box)**: 29.5%
 ### Mask R-CNN Results
 - **Box AP**: 15.8%
 - **Box AP50**: 23.9%
 - **Mask AP**: 15.9%
 - **Mask AP50**: 23.7%
 - Best performance on: Metal bottle cap (50.4% AP), Clear plastic bottle (42.6% AP), Drink can (40.1% AP)
 Results are saved in respective `results/` directories with:
 - Model weights (`.pth` or `.pt` files)
 - Evaluation metrics (JSON format)
 - Training logs and visualizations
 - Confusion matrices
 ## Training Parameters
 ### YOLO
 - Epochs: 200
 - Batch size: 16
 - Image size: 960x960
 - Learning rate: 0.01
 - Optimizer: AdamW
 - Data augmentation: Enabled
 ### Mask R-CNN
 - Iterations: 1000-3000
 - Batch size: 8
 - Image size: 960x960
 - Learning rate: 0.00025
 - Backbone: ResNet-101 FPN
 - ROI batch size: 16
 ### Mask2Former
 - Configuration: COCO instance segmentation
 - Backbone: ResNet-101
 - Image size: Variable
 ## Requirements
 ### Common Dependencies
 - Python 3.8+
 - PyTorch
 - CUDA (for GPU training)
 - OpenCV
 - NumPy
 - Matplotlib
 ### Model-Specific
 See individual `requirements.txt` files in each model directory for complete dependency lists.