init commit

This commit is contained in:
2025-11-08 21:39:37 +01:00
parent 489da84586
commit 50b6df1c3f
5 changed files with 103 additions and 169 deletions

72
.gitignore vendored Normal file
View File

@@ -0,0 +1,72 @@
# Model files (large)
*.pt
*.pth
*.ckpt
*.h5
*.pb
*.onnx
*.tflite
# Results and output directories
results/
outputs/
checkpoints/
weights/
*.pkl
*.pickle
# Data directories
dataset/
data/
datasets/
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/
.venv
*.egg-info/
dist/
build/
# Jupyter Notebook
.ipynb_checkpoints
*.ipynb_checkpoints/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Logs
*.log
logs/
tensorboard_logs/
events.out.tfevents.*
# Cache
.cache/
*.cache
.pytest_cache/
# Temporary files
*.tmp
*.temp
*.bak

1
MRCNN/main.ipynb Normal file

File diff suppressed because one or more lines are too long

12
MRCNN/requirements.txt Normal file
View File

@@ -0,0 +1,12 @@
torch
numpy>=1.13
pyyaml
matplotlib
opencv-python>=3.2
setuptools
Cython
mock
scipy
six
future
protobuf

186
README.md
View File

@@ -1,183 +1,31 @@
# Image and Video Understanding Project
A comprehensive project comparing multiple state-of-the-art deep learning models for object detection and instance segmentation on a waste/litter detection dataset.
## Overview
This project evaluates and compares different deep learning architectures for instance segmentation on a custom waste detection dataset. Each model is trained and evaluated on the same dataset to enable fair comparison.
## Models
### 1. YOLO (YOLOv8l-seg)
- **Model**: YOLOv8 Large Segmentation
- **Framework**: Ultralytics
- **Parameters**: 45.9M
- **Training**: 200 epochs, batch size 16, image size 960x960
- **Features**: Real-time inference, bounding box + mask prediction
### 2. Mask R-CNN
- **Backbone**: ResNet-101 with FPN
- **Framework**: Detectron2
- **Training**: 1000-3000 iterations, batch size 8, image size 960x960
- **Features**: Instance segmentation with high accuracy
### 3. Mask2Former
- **Architecture**: Transformer-based segmentation
- **Framework**: Detectron2
- **Features**: Unified framework for semantic, instance, and panoptic segmentation
### 4. DETR
- **Status**: Dataset prepared (implementation in progress)
## Dataset
Custom waste/litter detection dataset with **20 classes**:
- Clear plastic bottle, Glass bottle, Plastic bottle cap, Metal bottle cap
- Broken glass, Drink can, Other carton, Corrugated carton
- Paper cup, Disposable plastic cup, Plastic lid, Other plastic
- Normal paper, Plastic film, Other plastic wrapper, Pop tab
- Plastic straw, Styrofoam piece, Unlabeled litter, Cigarette
**Dataset Structure**: Train/Val/Test splits in COCO format
This project compares two deep learning models for instance segmentation on waste detection: Mask R-CNN (using Detectron2) and YOLOv8.
## Project Structure
```
├── YOLO/
│ ├── main.ipynb # Training and evaluation notebook
│ ├── results/
│ │ ├── train_200_960_16/ # Training outputs
│ │ └── evaluation_200_960_16/ # Evaluation results
│ └── dataset/ # Dataset configuration
├── MRCNN/
│ ├── main.ipynb # Training and evaluation notebook
│ ├── results/
│ │ ├── train_1000_iter/ # Training outputs
│ │ └── eval/ # Evaluation metrics
│ └── requirements.txt
├── M2FORMER/
│ ├── main.ipynb # Training and evaluation notebook
│ ├── output/ # Training outputs
│ ├── Mask2Former/ # Mask2Former repository
│ └── requirements.txt
└── DETR/
└── dataset/ # Image data
```
- **MRCNN/**: Mask R-CNN implementation using Detectron2
- Training and evaluation code in `main.ipynb`
- Trained models and results in `results/`
## Setup
- **YOLO/**: YOLOv8 segmentation implementation
- Training and evaluation code in `main.ipynb`
- Trained models and results in `results/`
### Prerequisites
- Python 3.8+
- PyTorch (with CUDA support recommended)
- GPU recommended for training
## Dataset
### Installation
Both models are trained on the TACO (Trash Annotations in Context) dataset with 20 classes of waste objects including:
- Plastic bottles, glass bottles, bottle caps
- Drink cans, paper cups, cartons
- Plastic film, wrappers, straws
- Cigarettes, and other litter items
#### YOLO
```bash
pip install ultralytics
```
## Models
#### Mask R-CNN
```bash
pip install -r MRCNN/requirements.txt
pip install 'git+https://github.com/facebookresearch/detectron2.git'
```
#### Mask2Former
```bash
pip install -r M2FORMER/requirements.txt
pip install 'git+https://github.com/facebookresearch/detectron2.git'
git clone https://github.com/facebookresearch/Mask2Former.git
cd Mask2Former/mask2former/modeling/pixel_decoder/ops/
./make.sh # Compile CUDA operations
```
## Usage
### Training
Each model has a Jupyter notebook (`main.ipynb`) with complete training pipelines:
1. **YOLO**: Open `YOLO/main.ipynb`
- Configure dataset path in `data.yaml`
- Run training cells
- Model saves checkpoints every 10 epochs
2. **Mask R-CNN**: Open `MRCNN/main.ipynb`
- Configure dataset paths and parameters
- Register COCO format datasets
- Train and evaluate
3. **Mask2Former**: Open `M2FORMER/main.ipynb`
- Setup Mask2Former repository
- Configure training parameters
- Train and evaluate
### Evaluation
All notebooks include:
- COCO-style evaluation metrics
- Confusion matrix generation
- Prediction visualization
- Performance comparison tools
- **Mask R-CNN**: ResNet-101 backbone with Feature Pyramid Network
- **YOLOv8**: Large segmentation model (YOLOv8l-seg)
## Results
### YOLO Results
- **Box mAP50**: 26.9%
- **Box mAP50-95**: 20.7%
- **Mask mAP50**: 26.7%
- **Mask mAP50-95**: 19.5%
- **Precision (Box)**: 28.8%
- **Recall (Box)**: 29.5%
### Mask R-CNN Results
- **Box AP**: 15.8%
- **Box AP50**: 23.9%
- **Mask AP**: 15.9%
- **Mask AP50**: 23.7%
- Best performance on: Metal bottle cap (50.4% AP), Clear plastic bottle (42.6% AP), Drink can (40.1% AP)
Results are saved in respective `results/` directories with:
- Model weights (`.pth` or `.pt` files)
- Evaluation metrics (JSON format)
- Training logs and visualizations
- Confusion matrices
## Training Parameters
### YOLO
- Epochs: 200
- Batch size: 16
- Image size: 960x960
- Learning rate: 0.01
- Optimizer: AdamW
- Data augmentation: Enabled
### Mask R-CNN
- Iterations: 1000-3000
- Batch size: 8
- Image size: 960x960
- Learning rate: 0.00025
- Backbone: ResNet-101 FPN
- ROI batch size: 16
### Mask2Former
- Configuration: COCO instance segmentation
- Backbone: ResNet-101
- Image size: Variable
## Requirements
### Common Dependencies
- Python 3.8+
- PyTorch
- CUDA (for GPU training)
- OpenCV
- NumPy
- Matplotlib
### Model-Specific
See individual `requirements.txt` files in each model directory for complete dependency lists.
Training and evaluation results are stored in the respective `results/` directories for each model.

1
YOLO/main.ipynb Normal file

File diff suppressed because one or more lines are too long