init commit
This commit is contained in:
72
.gitignore
vendored
Normal file
72
.gitignore
vendored
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
# Model files (large)
|
||||||
|
*.pt
|
||||||
|
*.pth
|
||||||
|
*.ckpt
|
||||||
|
*.h5
|
||||||
|
*.pb
|
||||||
|
*.onnx
|
||||||
|
*.tflite
|
||||||
|
|
||||||
|
# Results and output directories
|
||||||
|
results/
|
||||||
|
outputs/
|
||||||
|
checkpoints/
|
||||||
|
weights/
|
||||||
|
*.pkl
|
||||||
|
*.pickle
|
||||||
|
|
||||||
|
# Data directories
|
||||||
|
dataset/
|
||||||
|
data/
|
||||||
|
datasets/
|
||||||
|
|
||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
*.so
|
||||||
|
.Python
|
||||||
|
env/
|
||||||
|
venv/
|
||||||
|
ENV/
|
||||||
|
.venv
|
||||||
|
*.egg-info/
|
||||||
|
dist/
|
||||||
|
build/
|
||||||
|
|
||||||
|
# Jupyter Notebook
|
||||||
|
.ipynb_checkpoints
|
||||||
|
*.ipynb_checkpoints/
|
||||||
|
|
||||||
|
# IDE
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
*.swp
|
||||||
|
*.swo
|
||||||
|
*~
|
||||||
|
|
||||||
|
# OS
|
||||||
|
.DS_Store
|
||||||
|
.DS_Store?
|
||||||
|
._*
|
||||||
|
.Spotlight-V100
|
||||||
|
.Trashes
|
||||||
|
ehthumbs.db
|
||||||
|
Thumbs.db
|
||||||
|
|
||||||
|
# Logs
|
||||||
|
*.log
|
||||||
|
logs/
|
||||||
|
tensorboard_logs/
|
||||||
|
events.out.tfevents.*
|
||||||
|
|
||||||
|
# Cache
|
||||||
|
.cache/
|
||||||
|
*.cache
|
||||||
|
.pytest_cache/
|
||||||
|
|
||||||
|
# Temporary files
|
||||||
|
*.tmp
|
||||||
|
*.temp
|
||||||
|
*.bak
|
||||||
|
|
||||||
1
MRCNN/main.ipynb
Normal file
1
MRCNN/main.ipynb
Normal file
File diff suppressed because one or more lines are too long
12
MRCNN/requirements.txt
Normal file
12
MRCNN/requirements.txt
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
torch
|
||||||
|
numpy>=1.13
|
||||||
|
pyyaml
|
||||||
|
matplotlib
|
||||||
|
opencv-python>=3.2
|
||||||
|
setuptools
|
||||||
|
Cython
|
||||||
|
mock
|
||||||
|
scipy
|
||||||
|
six
|
||||||
|
future
|
||||||
|
protobuf
|
||||||
186
README.md
186
README.md
@@ -1,183 +1,31 @@
|
|||||||
# Image and Video Understanding Project
|
# Image and Video Understanding Project
|
||||||
|
|
||||||
A comprehensive project comparing multiple state-of-the-art deep learning models for object detection and instance segmentation on a waste/litter detection dataset.
|
This project compares two deep learning models for instance segmentation on waste detection: Mask R-CNN (using Detectron2) and YOLOv8.
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
This project evaluates and compares different deep learning architectures for instance segmentation on a custom waste detection dataset. Each model is trained and evaluated on the same dataset to enable fair comparison.
|
|
||||||
|
|
||||||
## Models
|
|
||||||
|
|
||||||
### 1. YOLO (YOLOv8l-seg)
|
|
||||||
- **Model**: YOLOv8 Large Segmentation
|
|
||||||
- **Framework**: Ultralytics
|
|
||||||
- **Parameters**: 45.9M
|
|
||||||
- **Training**: 200 epochs, batch size 16, image size 960x960
|
|
||||||
- **Features**: Real-time inference, bounding box + mask prediction
|
|
||||||
|
|
||||||
### 2. Mask R-CNN
|
|
||||||
- **Backbone**: ResNet-101 with FPN
|
|
||||||
- **Framework**: Detectron2
|
|
||||||
- **Training**: 1000-3000 iterations, batch size 8, image size 960x960
|
|
||||||
- **Features**: Instance segmentation with high accuracy
|
|
||||||
|
|
||||||
### 3. Mask2Former
|
|
||||||
- **Architecture**: Transformer-based segmentation
|
|
||||||
- **Framework**: Detectron2
|
|
||||||
- **Features**: Unified framework for semantic, instance, and panoptic segmentation
|
|
||||||
|
|
||||||
### 4. DETR
|
|
||||||
- **Status**: Dataset prepared (implementation in progress)
|
|
||||||
|
|
||||||
## Dataset
|
|
||||||
|
|
||||||
Custom waste/litter detection dataset with **20 classes**:
|
|
||||||
- Clear plastic bottle, Glass bottle, Plastic bottle cap, Metal bottle cap
|
|
||||||
- Broken glass, Drink can, Other carton, Corrugated carton
|
|
||||||
- Paper cup, Disposable plastic cup, Plastic lid, Other plastic
|
|
||||||
- Normal paper, Plastic film, Other plastic wrapper, Pop tab
|
|
||||||
- Plastic straw, Styrofoam piece, Unlabeled litter, Cigarette
|
|
||||||
|
|
||||||
**Dataset Structure**: Train/Val/Test splits in COCO format
|
|
||||||
|
|
||||||
## Project Structure
|
## Project Structure
|
||||||
|
|
||||||
```
|
- **MRCNN/**: Mask R-CNN implementation using Detectron2
|
||||||
├── YOLO/
|
- Training and evaluation code in `main.ipynb`
|
||||||
│ ├── main.ipynb # Training and evaluation notebook
|
- Trained models and results in `results/`
|
||||||
│ ├── results/
|
|
||||||
│ │ ├── train_200_960_16/ # Training outputs
|
|
||||||
│ │ └── evaluation_200_960_16/ # Evaluation results
|
|
||||||
│ └── dataset/ # Dataset configuration
|
|
||||||
├── MRCNN/
|
|
||||||
│ ├── main.ipynb # Training and evaluation notebook
|
|
||||||
│ ├── results/
|
|
||||||
│ │ ├── train_1000_iter/ # Training outputs
|
|
||||||
│ │ └── eval/ # Evaluation metrics
|
|
||||||
│ └── requirements.txt
|
|
||||||
├── M2FORMER/
|
|
||||||
│ ├── main.ipynb # Training and evaluation notebook
|
|
||||||
│ ├── output/ # Training outputs
|
|
||||||
│ ├── Mask2Former/ # Mask2Former repository
|
|
||||||
│ └── requirements.txt
|
|
||||||
└── DETR/
|
|
||||||
└── dataset/ # Image data
|
|
||||||
```
|
|
||||||
|
|
||||||
## Setup
|
- **YOLO/**: YOLOv8 segmentation implementation
|
||||||
|
- Training and evaluation code in `main.ipynb`
|
||||||
|
- Trained models and results in `results/`
|
||||||
|
|
||||||
### Prerequisites
|
## Dataset
|
||||||
- Python 3.8+
|
|
||||||
- PyTorch (with CUDA support recommended)
|
|
||||||
- GPU recommended for training
|
|
||||||
|
|
||||||
### Installation
|
Both models are trained on the TACO (Trash Annotations in Context) dataset with 20 classes of waste objects including:
|
||||||
|
- Plastic bottles, glass bottles, bottle caps
|
||||||
|
- Drink cans, paper cups, cartons
|
||||||
|
- Plastic film, wrappers, straws
|
||||||
|
- Cigarettes, and other litter items
|
||||||
|
|
||||||
#### YOLO
|
## Models
|
||||||
```bash
|
|
||||||
pip install ultralytics
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Mask R-CNN
|
- **Mask R-CNN**: ResNet-101 backbone with Feature Pyramid Network
|
||||||
```bash
|
- **YOLOv8**: Large segmentation model (YOLOv8l-seg)
|
||||||
pip install -r MRCNN/requirements.txt
|
|
||||||
pip install 'git+https://github.com/facebookresearch/detectron2.git'
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Mask2Former
|
|
||||||
```bash
|
|
||||||
pip install -r M2FORMER/requirements.txt
|
|
||||||
pip install 'git+https://github.com/facebookresearch/detectron2.git'
|
|
||||||
git clone https://github.com/facebookresearch/Mask2Former.git
|
|
||||||
cd Mask2Former/mask2former/modeling/pixel_decoder/ops/
|
|
||||||
./make.sh # Compile CUDA operations
|
|
||||||
```
|
|
||||||
|
|
||||||
## Usage
|
|
||||||
|
|
||||||
### Training
|
|
||||||
|
|
||||||
Each model has a Jupyter notebook (`main.ipynb`) with complete training pipelines:
|
|
||||||
|
|
||||||
1. **YOLO**: Open `YOLO/main.ipynb`
|
|
||||||
- Configure dataset path in `data.yaml`
|
|
||||||
- Run training cells
|
|
||||||
- Model saves checkpoints every 10 epochs
|
|
||||||
|
|
||||||
2. **Mask R-CNN**: Open `MRCNN/main.ipynb`
|
|
||||||
- Configure dataset paths and parameters
|
|
||||||
- Register COCO format datasets
|
|
||||||
- Train and evaluate
|
|
||||||
|
|
||||||
3. **Mask2Former**: Open `M2FORMER/main.ipynb`
|
|
||||||
- Setup Mask2Former repository
|
|
||||||
- Configure training parameters
|
|
||||||
- Train and evaluate
|
|
||||||
|
|
||||||
### Evaluation
|
|
||||||
|
|
||||||
All notebooks include:
|
|
||||||
- COCO-style evaluation metrics
|
|
||||||
- Confusion matrix generation
|
|
||||||
- Prediction visualization
|
|
||||||
- Performance comparison tools
|
|
||||||
|
|
||||||
## Results
|
## Results
|
||||||
|
|
||||||
### YOLO Results
|
Training and evaluation results are stored in the respective `results/` directories for each model.
|
||||||
- **Box mAP50**: 26.9%
|
|
||||||
- **Box mAP50-95**: 20.7%
|
|
||||||
- **Mask mAP50**: 26.7%
|
|
||||||
- **Mask mAP50-95**: 19.5%
|
|
||||||
- **Precision (Box)**: 28.8%
|
|
||||||
- **Recall (Box)**: 29.5%
|
|
||||||
|
|
||||||
### Mask R-CNN Results
|
|
||||||
- **Box AP**: 15.8%
|
|
||||||
- **Box AP50**: 23.9%
|
|
||||||
- **Mask AP**: 15.9%
|
|
||||||
- **Mask AP50**: 23.7%
|
|
||||||
- Best performance on: Metal bottle cap (50.4% AP), Clear plastic bottle (42.6% AP), Drink can (40.1% AP)
|
|
||||||
|
|
||||||
Results are saved in respective `results/` directories with:
|
|
||||||
- Model weights (`.pth` or `.pt` files)
|
|
||||||
- Evaluation metrics (JSON format)
|
|
||||||
- Training logs and visualizations
|
|
||||||
- Confusion matrices
|
|
||||||
|
|
||||||
## Training Parameters
|
|
||||||
|
|
||||||
### YOLO
|
|
||||||
- Epochs: 200
|
|
||||||
- Batch size: 16
|
|
||||||
- Image size: 960x960
|
|
||||||
- Learning rate: 0.01
|
|
||||||
- Optimizer: AdamW
|
|
||||||
- Data augmentation: Enabled
|
|
||||||
|
|
||||||
### Mask R-CNN
|
|
||||||
- Iterations: 1000-3000
|
|
||||||
- Batch size: 8
|
|
||||||
- Image size: 960x960
|
|
||||||
- Learning rate: 0.00025
|
|
||||||
- Backbone: ResNet-101 FPN
|
|
||||||
- ROI batch size: 16
|
|
||||||
|
|
||||||
### Mask2Former
|
|
||||||
- Configuration: COCO instance segmentation
|
|
||||||
- Backbone: ResNet-101
|
|
||||||
- Image size: Variable
|
|
||||||
|
|
||||||
## Requirements
|
|
||||||
|
|
||||||
### Common Dependencies
|
|
||||||
- Python 3.8+
|
|
||||||
- PyTorch
|
|
||||||
- CUDA (for GPU training)
|
|
||||||
- OpenCV
|
|
||||||
- NumPy
|
|
||||||
- Matplotlib
|
|
||||||
|
|
||||||
### Model-Specific
|
|
||||||
See individual `requirements.txt` files in each model directory for complete dependency lists.
|
|
||||||
|
|
||||||
|
|||||||
1
YOLO/main.ipynb
Normal file
1
YOLO/main.ipynb
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user