Image-and-Video-Understandi…/README.md

# Image and Video Understanding Project

This project compares two deep learning models for instance segmentation on waste detection: Mask R-CNN (using Detectron2) and YOLOv8.

## Project Structure

- **MRCNN/**: Mask R-CNN implementation using Detectron2
  - Training and evaluation code in `main.ipynb`
  - Trained models and results in `results/`

- **YOLO/**: YOLOv8 segmentation implementation
  - Training and evaluation code in `main.ipynb`
  - Trained models and results in `results/`

## Dataset

Both models are trained on the TACO (Trash Annotations in Context) dataset with 20 classes of waste objects including:
- Plastic bottles, glass bottles, bottle caps
- Drink cans, paper cups, cartons
- Plastic film, wrappers, straws
- Cigarettes, and other litter items

## Models

- **Mask R-CNN**: ResNet-101 backbone with Feature Pyramid Network
- **YOLOv8**: Large segmentation model (YOLOv8l-seg)

## Results

Training and evaluation results are stored in the respective `results/` directories for each model.