init commit

2025-11-08 21:39:37 +01:00
parent 489da84586
commit 50b6df1c3f
5 changed files with 103 additions and 169 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,72 @@
 # Model files (large)
 *.pt
 *.pth
 *.ckpt
 *.h5
 *.pb
 *.onnx
 *.tflite
 # Results and output directories
 results/
 outputs/
 checkpoints/
 weights/
 *.pkl
 *.pickle
 # Data directories
 dataset/
 data/
 datasets/
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
 env/
 venv/
 ENV/
 .venv
 *.egg-info/
 dist/
 build/
 # Jupyter Notebook
 .ipynb_checkpoints
 *.ipynb_checkpoints/
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 # OS
 .DS_Store
 .DS_Store?
 ._*
 .Spotlight-V100
 .Trashes
 ehthumbs.db
 Thumbs.db
 # Logs
 *.log
 logs/
 tensorboard_logs/
 events.out.tfevents.*
 # Cache
 .cache/
 *.cache
 .pytest_cache/
 # Temporary files
 *.tmp
 *.temp
 *.bak
--- a/MRCNN/main.ipynb
+++ b/MRCNN/main.ipynb
--- a/MRCNN/requirements.txt
+++ b/MRCNN/requirements.txt
@@ -0,0 +1,12 @@
 torch
 numpy>=1.13
 pyyaml
 matplotlib
 opencv-python>=3.2
 setuptools
 Cython
 mock
 scipy
 six
 future
 protobuf
--- a/README.md
+++ b/README.md
@@ -1,183 +1,31 @@
 # Image and Video Understanding Project
-A comprehensive project comparing multiple state-of-the-art deep learning models for object detection and instance segmentation on a waste/litter detection dataset.
+This project compares two deep learning models for instance segmentation on waste detection: Mask R-CNN (using Detectron2) and YOLOv8.
 ## Overview
 This project evaluates and compares different deep learning architectures for instance segmentation on a custom waste detection dataset. Each model is trained and evaluated on the same dataset to enable fair comparison.
 ## Models
 ### 1. YOLO (YOLOv8l-seg)
 - **Model**: YOLOv8 Large Segmentation
 - **Framework**: Ultralytics
 - **Parameters**: 45.9M
 - **Training**: 200 epochs, batch size 16, image size 960x960
 - **Features**: Real-time inference, bounding box + mask prediction
 ### 2. Mask R-CNN
 - **Backbone**: ResNet-101 with FPN
 - **Framework**: Detectron2
 - **Training**: 1000-3000 iterations, batch size 8, image size 960x960
 - **Features**: Instance segmentation with high accuracy
 ### 3. Mask2Former
 - **Architecture**: Transformer-based segmentation
 - **Framework**: Detectron2
 - **Features**: Unified framework for semantic, instance, and panoptic segmentation
 ### 4. DETR
 - **Status**: Dataset prepared (implementation in progress)
 ## Dataset
 Custom waste/litter detection dataset with **20 classes**:
 - Clear plastic bottle, Glass bottle, Plastic bottle cap, Metal bottle cap
 - Broken glass, Drink can, Other carton, Corrugated carton
 - Paper cup, Disposable plastic cup, Plastic lid, Other plastic
 - Normal paper, Plastic film, Other plastic wrapper, Pop tab
 - Plastic straw, Styrofoam piece, Unlabeled litter, Cigarette
 **Dataset Structure**: Train/Val/Test splits in COCO format
 ## Project Structure
-```
+- **MRCNN/**: Mask R-CNN implementation using Detectron2
-├── YOLO/
+  - Training and evaluation code in `main.ipynb`
-│   ├── main.ipynb                    # Training and evaluation notebook
+  - Trained models and results in `results/`
 │   ├── results/
 │   │   ├── train_200_960_16/        # Training outputs
 │   │   └── evaluation_200_960_16/   # Evaluation results
 │   └── dataset/                      # Dataset configuration
 ├── MRCNN/
 │   ├── main.ipynb                    # Training and evaluation notebook
 │   ├── results/
 │   │   ├── train_1000_iter/         # Training outputs
 │   │   └── eval/                     # Evaluation metrics
 │   └── requirements.txt
 ├── M2FORMER/
 │   ├── main.ipynb                    # Training and evaluation notebook
 │   ├── output/                       # Training outputs
 │   ├── Mask2Former/                  # Mask2Former repository
 │   └── requirements.txt
 └── DETR/
    └── dataset/                      # Image data
 ```
-## Setup
+- **YOLO/**: YOLOv8 segmentation implementation
  - Training and evaluation code in `main.ipynb`
  - Trained models and results in `results/`
-### Prerequisites
+## Dataset
 - Python 3.8+
 - PyTorch (with CUDA support recommended)
 - GPU recommended for training
-### Installation
+Both models are trained on the TACO (Trash Annotations in Context) dataset with 20 classes of waste objects including:
 - Plastic bottles, glass bottles, bottle caps
 - Drink cans, paper cups, cartons
 - Plastic film, wrappers, straws
 - Cigarettes, and other litter items
-#### YOLO
+## Models
 ```bash
 pip install ultralytics
 ```
-#### Mask R-CNN
+- **Mask R-CNN**: ResNet-101 backbone with Feature Pyramid Network
-```bash
+- **YOLOv8**: Large segmentation model (YOLOv8l-seg)
 pip install -r MRCNN/requirements.txt
 pip install 'git+https://github.com/facebookresearch/detectron2.git'
 ```
 #### Mask2Former
 ```bash
 pip install -r M2FORMER/requirements.txt
 pip install 'git+https://github.com/facebookresearch/detectron2.git'
 git clone https://github.com/facebookresearch/Mask2Former.git
 cd Mask2Former/mask2former/modeling/pixel_decoder/ops/
 ./make.sh  # Compile CUDA operations
 ```
 ## Usage
 ### Training
 Each model has a Jupyter notebook (`main.ipynb`) with complete training pipelines:
 1. **YOLO**: Open `YOLO/main.ipynb`
   - Configure dataset path in `data.yaml`
   - Run training cells
   - Model saves checkpoints every 10 epochs
 2. **Mask R-CNN**: Open `MRCNN/main.ipynb`
   - Configure dataset paths and parameters
   - Register COCO format datasets
   - Train and evaluate
 3. **Mask2Former**: Open `M2FORMER/main.ipynb`
   - Setup Mask2Former repository
   - Configure training parameters
   - Train and evaluate
 ### Evaluation
 All notebooks include:
 - COCO-style evaluation metrics
 - Confusion matrix generation
 - Prediction visualization
 - Performance comparison tools
 ## Results
-### YOLO Results
+Training and evaluation results are stored in the respective `results/` directories for each model.
 - **Box mAP50**: 26.9%
 - **Box mAP50-95**: 20.7%
 - **Mask mAP50**: 26.7%
 - **Mask mAP50-95**: 19.5%
 - **Precision (Box)**: 28.8%
 - **Recall (Box)**: 29.5%
 ### Mask R-CNN Results
 - **Box AP**: 15.8%
 - **Box AP50**: 23.9%
 - **Mask AP**: 15.9%
 - **Mask AP50**: 23.7%
 - Best performance on: Metal bottle cap (50.4% AP), Clear plastic bottle (42.6% AP), Drink can (40.1% AP)
 Results are saved in respective `results/` directories with:
 - Model weights (`.pth` or `.pt` files)
 - Evaluation metrics (JSON format)
 - Training logs and visualizations
 - Confusion matrices
 ## Training Parameters
 ### YOLO
 - Epochs: 200
 - Batch size: 16
 - Image size: 960x960
 - Learning rate: 0.01
 - Optimizer: AdamW
 - Data augmentation: Enabled
 ### Mask R-CNN
 - Iterations: 1000-3000
 - Batch size: 8
 - Image size: 960x960
 - Learning rate: 0.00025
 - Backbone: ResNet-101 FPN
 - ROI batch size: 16
 ### Mask2Former
 - Configuration: COCO instance segmentation
 - Backbone: ResNet-101
 - Image size: Variable
 ## Requirements
 ### Common Dependencies
 - Python 3.8+
 - PyTorch
 - CUDA (for GPU training)
 - OpenCV
 - NumPy
 - Matplotlib
 ### Model-Specific
 See individual `requirements.txt` files in each model directory for complete dependency lists.
--- a/YOLO/main.ipynb
+++ b/YOLO/main.ipynb