commit 489da845860290faea32909c20ceb01ca6c27294
Author: saeedkhosravi94 <saeedkhosravi72@gmail.com>
Date:   Sat Nov 8 21:37:52 2025 +0100

    first commit

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..9dbc0e4
--- /dev/null
+++ b/README.md
@@ -0,0 +1,183 @@
+# Image and Video Understanding Project
+
+A comprehensive project comparing multiple state-of-the-art deep learning models for object detection and instance segmentation on a waste/litter detection dataset.
+
+## Overview
+
+This project evaluates and compares different deep learning architectures for instance segmentation on a custom waste detection dataset. Each model is trained and evaluated on the same dataset to enable fair comparison.
+
+## Models
+
+### 1. YOLO (YOLOv8l-seg)
+- **Model**: YOLOv8 Large Segmentation
+- **Framework**: Ultralytics
+- **Parameters**: 45.9M
+- **Training**: 200 epochs, batch size 16, image size 960x960
+- **Features**: Real-time inference, bounding box + mask prediction
+
+### 2. Mask R-CNN
+- **Backbone**: ResNet-101 with FPN
+- **Framework**: Detectron2
+- **Training**: 1000-3000 iterations, batch size 8, image size 960x960
+- **Features**: Instance segmentation with high accuracy
+
+### 3. Mask2Former
+- **Architecture**: Transformer-based segmentation
+- **Framework**: Detectron2
+- **Features**: Unified framework for semantic, instance, and panoptic segmentation
+
+### 4. DETR
+- **Status**: Dataset prepared (implementation in progress)
+
+## Dataset
+
+Custom waste/litter detection dataset with **20 classes**:
+- Clear plastic bottle, Glass bottle, Plastic bottle cap, Metal bottle cap
+- Broken glass, Drink can, Other carton, Corrugated carton
+- Paper cup, Disposable plastic cup, Plastic lid, Other plastic
+- Normal paper, Plastic film, Other plastic wrapper, Pop tab
+- Plastic straw, Styrofoam piece, Unlabeled litter, Cigarette
+
+**Dataset Structure**: Train/Val/Test splits in COCO format
+
+## Project Structure
+
+```
+├── YOLO/
+│   ├── main.ipynb                    # Training and evaluation notebook
+│   ├── results/
+│   │   ├── train_200_960_16/        # Training outputs
+│   │   └── evaluation_200_960_16/   # Evaluation results
+│   └── dataset/                      # Dataset configuration
+├── MRCNN/
+│   ├── main.ipynb                    # Training and evaluation notebook
+│   ├── results/
+│   │   ├── train_1000_iter/         # Training outputs
+│   │   └── eval/                     # Evaluation metrics
+│   └── requirements.txt
+├── M2FORMER/
+│   ├── main.ipynb                    # Training and evaluation notebook
+│   ├── output/                       # Training outputs
+│   ├── Mask2Former/                  # Mask2Former repository
+│   └── requirements.txt
+└── DETR/
+    └── dataset/                      # Image data
+```
+
+## Setup
+
+### Prerequisites
+- Python 3.8+
+- PyTorch (with CUDA support recommended)
+- GPU recommended for training
+
+### Installation
+
+#### YOLO
+```bash
+pip install ultralytics
+```
+
+#### Mask R-CNN
+```bash
+pip install -r MRCNN/requirements.txt
+pip install 'git+https://github.com/facebookresearch/detectron2.git'
+```
+
+#### Mask2Former
+```bash
+pip install -r M2FORMER/requirements.txt
+pip install 'git+https://github.com/facebookresearch/detectron2.git'
+git clone https://github.com/facebookresearch/Mask2Former.git
+cd Mask2Former/mask2former/modeling/pixel_decoder/ops/
+./make.sh  # Compile CUDA operations
+```
+
+## Usage
+
+### Training
+
+Each model has a Jupyter notebook (`main.ipynb`) with complete training pipelines:
+
+1. **YOLO**: Open `YOLO/main.ipynb`
+   - Configure dataset path in `data.yaml`
+   - Run training cells
+   - Model saves checkpoints every 10 epochs
+
+2. **Mask R-CNN**: Open `MRCNN/main.ipynb`
+   - Configure dataset paths and parameters
+   - Register COCO format datasets
+   - Train and evaluate
+
+3. **Mask2Former**: Open `M2FORMER/main.ipynb`
+   - Setup Mask2Former repository
+   - Configure training parameters
+   - Train and evaluate
+
+### Evaluation
+
+All notebooks include:
+- COCO-style evaluation metrics
+- Confusion matrix generation
+- Prediction visualization
+- Performance comparison tools
+
+## Results
+
+### YOLO Results
+- **Box mAP50**: 26.9%
+- **Box mAP50-95**: 20.7%
+- **Mask mAP50**: 26.7%
+- **Mask mAP50-95**: 19.5%
+- **Precision (Box)**: 28.8%
+- **Recall (Box)**: 29.5%
+
+### Mask R-CNN Results
+- **Box AP**: 15.8%
+- **Box AP50**: 23.9%
+- **Mask AP**: 15.9%
+- **Mask AP50**: 23.7%
+- Best performance on: Metal bottle cap (50.4% AP), Clear plastic bottle (42.6% AP), Drink can (40.1% AP)
+
+Results are saved in respective `results/` directories with:
+- Model weights (`.pth` or `.pt` files)
+- Evaluation metrics (JSON format)
+- Training logs and visualizations
+- Confusion matrices
+
+## Training Parameters
+
+### YOLO
+- Epochs: 200
+- Batch size: 16
+- Image size: 960x960
+- Learning rate: 0.01
+- Optimizer: AdamW
+- Data augmentation: Enabled
+
+### Mask R-CNN
+- Iterations: 1000-3000
+- Batch size: 8
+- Image size: 960x960
+- Learning rate: 0.00025
+- Backbone: ResNet-101 FPN
+- ROI batch size: 16
+
+### Mask2Former
+- Configuration: COCO instance segmentation
+- Backbone: ResNet-101
+- Image size: Variable
+
+## Requirements
+
+### Common Dependencies
+- Python 3.8+
+- PyTorch
+- CUDA (for GPU training)
+- OpenCV
+- NumPy
+- Matplotlib
+
+### Model-Specific
+See individual `requirements.txt` files in each model directory for complete dependency lists.
+