init commit

2025-11-08 21:39:37 +01:00
parent 489da84586
commit 50b6df1c3f
5 changed files with 103 additions and 169 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,72 @@
+# Model files (large)
+*.pt
+*.pth
+*.ckpt
+*.h5
+*.pb
+*.onnx
+*.tflite
+
+# Results and output directories
+results/
+outputs/
+checkpoints/
+weights/
+*.pkl
+*.pickle
+
+# Data directories
+dataset/
+data/
+datasets/
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+env/
+venv/
+ENV/
+.venv
+*.egg-info/
+dist/
+build/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+*.ipynb_checkpoints/
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+
+# Logs
+*.log
+logs/
+tensorboard_logs/
+events.out.tfevents.*
+
+# Cache
+.cache/
+*.cache
+.pytest_cache/
+
+# Temporary files
+*.tmp
+*.temp
+*.bak
+
--- a/MRCNN/main.ipynb
+++ b/MRCNN/main.ipynb
--- a/MRCNN/requirements.txt
+++ b/MRCNN/requirements.txt
@@ -0,0 +1,12 @@
+torch
+numpy>=1.13
+pyyaml
+matplotlib
+opencv-python>=3.2
+setuptools
+Cython
+mock
+scipy
+six
+future
+protobuf
--- a/README.md
+++ b/README.md
@@ -1,183 +1,31 @@
 # Image and Video Understanding Project

-A comprehensive project comparing multiple state-of-the-art deep learning models for object detection and instance segmentation on a waste/litter detection dataset.
-
-## Overview
-
-This project evaluates and compares different deep learning architectures for instance segmentation on a custom waste detection dataset. Each model is trained and evaluated on the same dataset to enable fair comparison.
-
-## Models
-
-### 1. YOLO (YOLOv8l-seg)
- **Model**: YOLOv8 Large Segmentation
- **Framework**: Ultralytics
- **Parameters**: 45.9M
- **Training**: 200 epochs, batch size 16, image size 960x960
- **Features**: Real-time inference, bounding box + mask prediction
-
-### 2. Mask R-CNN
- **Backbone**: ResNet-101 with FPN
- **Framework**: Detectron2
- **Training**: 1000-3000 iterations, batch size 8, image size 960x960
- **Features**: Instance segmentation with high accuracy
-
-### 3. Mask2Former
- **Architecture**: Transformer-based segmentation
- **Framework**: Detectron2
- **Features**: Unified framework for semantic, instance, and panoptic segmentation
-
-### 4. DETR
- **Status**: Dataset prepared (implementation in progress)
-
-## Dataset
-
-Custom waste/litter detection dataset with **20 classes**:
- Clear plastic bottle, Glass bottle, Plastic bottle cap, Metal bottle cap
- Broken glass, Drink can, Other carton, Corrugated carton
- Paper cup, Disposable plastic cup, Plastic lid, Other plastic
- Normal paper, Plastic film, Other plastic wrapper, Pop tab
- Plastic straw, Styrofoam piece, Unlabeled litter, Cigarette
-
-**Dataset Structure**: Train/Val/Test splits in COCO format
+This project compares two deep learning models for instance segmentation on waste detection: Mask R-CNN (using Detectron2) and YOLOv8.

 ## Project Structure

-```
-├── YOLO/
-│   ├── main.ipynb                    # Training and evaluation notebook
-│   ├── results/
-│   │   ├── train_200_960_16/        # Training outputs
-│   │   └── evaluation_200_960_16/   # Evaluation results
-│   └── dataset/                      # Dataset configuration
-├── MRCNN/
-│   ├── main.ipynb                    # Training and evaluation notebook
-│   ├── results/
-│   │   ├── train_1000_iter/         # Training outputs
-│   │   └── eval/                     # Evaluation metrics
-│   └── requirements.txt
-├── M2FORMER/
-│   ├── main.ipynb                    # Training and evaluation notebook
-│   ├── output/                       # Training outputs
-│   ├── Mask2Former/                  # Mask2Former repository
-│   └── requirements.txt
-└── DETR/
-    └── dataset/                      # Image data
-```
+- **MRCNN/**: Mask R-CNN implementation using Detectron2
+  - Training and evaluation code in `main.ipynb`
+  - Trained models and results in `results/`

-## Setup
+- **YOLO/**: YOLOv8 segmentation implementation
+  - Training and evaluation code in `main.ipynb`
+  - Trained models and results in `results/`

-### Prerequisites
- Python 3.8+
- PyTorch (with CUDA support recommended)
- GPU recommended for training
+## Dataset

-### Installation
+Both models are trained on the TACO (Trash Annotations in Context) dataset with 20 classes of waste objects including:
+- Plastic bottles, glass bottles, bottle caps
+- Drink cans, paper cups, cartons
+- Plastic film, wrappers, straws
+- Cigarettes, and other litter items

-#### YOLO
-```bash
-pip install ultralytics
-```
+## Models

-#### Mask R-CNN
-```bash
-pip install -r MRCNN/requirements.txt
-pip install 'git+https://github.com/facebookresearch/detectron2.git'
-```
-
-#### Mask2Former
-```bash
-pip install -r M2FORMER/requirements.txt
-pip install 'git+https://github.com/facebookresearch/detectron2.git'
-git clone https://github.com/facebookresearch/Mask2Former.git
-cd Mask2Former/mask2former/modeling/pixel_decoder/ops/
-./make.sh  # Compile CUDA operations
-```
-
-## Usage
-
-### Training
-
-Each model has a Jupyter notebook (`main.ipynb`) with complete training pipelines:
-
-1. **YOLO**: Open `YOLO/main.ipynb`
-   - Configure dataset path in `data.yaml`
-   - Run training cells
-   - Model saves checkpoints every 10 epochs
-
-2. **Mask R-CNN**: Open `MRCNN/main.ipynb`
-   - Configure dataset paths and parameters
-   - Register COCO format datasets
-   - Train and evaluate
-
-3. **Mask2Former**: Open `M2FORMER/main.ipynb`
-   - Setup Mask2Former repository
-   - Configure training parameters
-   - Train and evaluate
-
-### Evaluation
-
-All notebooks include:
- COCO-style evaluation metrics
- Confusion matrix generation
- Prediction visualization
- Performance comparison tools
+- **Mask R-CNN**: ResNet-101 backbone with Feature Pyramid Network
+- **YOLOv8**: Large segmentation model (YOLOv8l-seg)

 ## Results

-### YOLO Results
- **Box mAP50**: 26.9%
- **Box mAP50-95**: 20.7%
- **Mask mAP50**: 26.7%
- **Mask mAP50-95**: 19.5%
- **Precision (Box)**: 28.8%
- **Recall (Box)**: 29.5%
-
-### Mask R-CNN Results
- **Box AP**: 15.8%
- **Box AP50**: 23.9%
- **Mask AP**: 15.9%
- **Mask AP50**: 23.7%
- Best performance on: Metal bottle cap (50.4% AP), Clear plastic bottle (42.6% AP), Drink can (40.1% AP)
-
-Results are saved in respective `results/` directories with:
- Model weights (`.pth` or `.pt` files)
- Evaluation metrics (JSON format)
- Training logs and visualizations
- Confusion matrices
-
-## Training Parameters
-
-### YOLO
- Epochs: 200
- Batch size: 16
- Image size: 960x960
- Learning rate: 0.01
- Optimizer: AdamW
- Data augmentation: Enabled
-
-### Mask R-CNN
- Iterations: 1000-3000
- Batch size: 8
- Image size: 960x960
- Learning rate: 0.00025
- Backbone: ResNet-101 FPN
- ROI batch size: 16
-
-### Mask2Former
- Configuration: COCO instance segmentation
- Backbone: ResNet-101
- Image size: Variable
-
-## Requirements
-
-### Common Dependencies
- Python 3.8+
- PyTorch
- CUDA (for GPU training)
- OpenCV
- NumPy
- Matplotlib
-
-### Model-Specific
-See individual `requirements.txt` files in each model directory for complete dependency lists.
+Training and evaluation results are stored in the respective `results/` directories for each model.

--- a/YOLO/main.ipynb
+++ b/YOLO/main.ipynb