commit 489da845860290faea32909c20ceb01ca6c27294 Author: saeedkhosravi94 Date: Sat Nov 8 21:37:52 2025 +0100 first commit diff --git a/README.md b/README.md new file mode 100644 index 0000000..9dbc0e4 --- /dev/null +++ b/README.md @@ -0,0 +1,183 @@ +# Image and Video Understanding Project + +A comprehensive project comparing multiple state-of-the-art deep learning models for object detection and instance segmentation on a waste/litter detection dataset. + +## Overview + +This project evaluates and compares different deep learning architectures for instance segmentation on a custom waste detection dataset. Each model is trained and evaluated on the same dataset to enable fair comparison. + +## Models + +### 1. YOLO (YOLOv8l-seg) +- **Model**: YOLOv8 Large Segmentation +- **Framework**: Ultralytics +- **Parameters**: 45.9M +- **Training**: 200 epochs, batch size 16, image size 960x960 +- **Features**: Real-time inference, bounding box + mask prediction + +### 2. Mask R-CNN +- **Backbone**: ResNet-101 with FPN +- **Framework**: Detectron2 +- **Training**: 1000-3000 iterations, batch size 8, image size 960x960 +- **Features**: Instance segmentation with high accuracy + +### 3. Mask2Former +- **Architecture**: Transformer-based segmentation +- **Framework**: Detectron2 +- **Features**: Unified framework for semantic, instance, and panoptic segmentation + +### 4. DETR +- **Status**: Dataset prepared (implementation in progress) + +## Dataset + +Custom waste/litter detection dataset with **20 classes**: +- Clear plastic bottle, Glass bottle, Plastic bottle cap, Metal bottle cap +- Broken glass, Drink can, Other carton, Corrugated carton +- Paper cup, Disposable plastic cup, Plastic lid, Other plastic +- Normal paper, Plastic film, Other plastic wrapper, Pop tab +- Plastic straw, Styrofoam piece, Unlabeled litter, Cigarette + +**Dataset Structure**: Train/Val/Test splits in COCO format + +## Project Structure + +``` +├── YOLO/ +│ ├── main.ipynb # Training and evaluation notebook +│ ├── results/ +│ │ ├── train_200_960_16/ # Training outputs +│ │ └── evaluation_200_960_16/ # Evaluation results +│ └── dataset/ # Dataset configuration +├── MRCNN/ +│ ├── main.ipynb # Training and evaluation notebook +│ ├── results/ +│ │ ├── train_1000_iter/ # Training outputs +│ │ └── eval/ # Evaluation metrics +│ └── requirements.txt +├── M2FORMER/ +│ ├── main.ipynb # Training and evaluation notebook +│ ├── output/ # Training outputs +│ ├── Mask2Former/ # Mask2Former repository +│ └── requirements.txt +└── DETR/ + └── dataset/ # Image data +``` + +## Setup + +### Prerequisites +- Python 3.8+ +- PyTorch (with CUDA support recommended) +- GPU recommended for training + +### Installation + +#### YOLO +```bash +pip install ultralytics +``` + +#### Mask R-CNN +```bash +pip install -r MRCNN/requirements.txt +pip install 'git+https://github.com/facebookresearch/detectron2.git' +``` + +#### Mask2Former +```bash +pip install -r M2FORMER/requirements.txt +pip install 'git+https://github.com/facebookresearch/detectron2.git' +git clone https://github.com/facebookresearch/Mask2Former.git +cd Mask2Former/mask2former/modeling/pixel_decoder/ops/ +./make.sh # Compile CUDA operations +``` + +## Usage + +### Training + +Each model has a Jupyter notebook (`main.ipynb`) with complete training pipelines: + +1. **YOLO**: Open `YOLO/main.ipynb` + - Configure dataset path in `data.yaml` + - Run training cells + - Model saves checkpoints every 10 epochs + +2. **Mask R-CNN**: Open `MRCNN/main.ipynb` + - Configure dataset paths and parameters + - Register COCO format datasets + - Train and evaluate + +3. **Mask2Former**: Open `M2FORMER/main.ipynb` + - Setup Mask2Former repository + - Configure training parameters + - Train and evaluate + +### Evaluation + +All notebooks include: +- COCO-style evaluation metrics +- Confusion matrix generation +- Prediction visualization +- Performance comparison tools + +## Results + +### YOLO Results +- **Box mAP50**: 26.9% +- **Box mAP50-95**: 20.7% +- **Mask mAP50**: 26.7% +- **Mask mAP50-95**: 19.5% +- **Precision (Box)**: 28.8% +- **Recall (Box)**: 29.5% + +### Mask R-CNN Results +- **Box AP**: 15.8% +- **Box AP50**: 23.9% +- **Mask AP**: 15.9% +- **Mask AP50**: 23.7% +- Best performance on: Metal bottle cap (50.4% AP), Clear plastic bottle (42.6% AP), Drink can (40.1% AP) + +Results are saved in respective `results/` directories with: +- Model weights (`.pth` or `.pt` files) +- Evaluation metrics (JSON format) +- Training logs and visualizations +- Confusion matrices + +## Training Parameters + +### YOLO +- Epochs: 200 +- Batch size: 16 +- Image size: 960x960 +- Learning rate: 0.01 +- Optimizer: AdamW +- Data augmentation: Enabled + +### Mask R-CNN +- Iterations: 1000-3000 +- Batch size: 8 +- Image size: 960x960 +- Learning rate: 0.00025 +- Backbone: ResNet-101 FPN +- ROI batch size: 16 + +### Mask2Former +- Configuration: COCO instance segmentation +- Backbone: ResNet-101 +- Image size: Variable + +## Requirements + +### Common Dependencies +- Python 3.8+ +- PyTorch +- CUDA (for GPU training) +- OpenCV +- NumPy +- Matplotlib + +### Model-Specific +See individual `requirements.txt` files in each model directory for complete dependency lists. +