CROMA Classification

Foundation Model for Remote Sensing Classification

Multi-modal land cover classification using CROMA (Contrastive Radar-Optical Masked Autoencoders), a pretrained Vision Transformer, on the EuroSAT dataset. The model classifies Sentinel-2 satellite imagery into 10 CORINE land cover categories including vegetation, buildings, and water bodies.

The project explored different embedding extraction strategies from the pretrained CROMA-ViT backbone, including integration with HuggingFace SMARTIES transformers. Embeddings were generated with mean-std standardization for cross-sensor robustness and used for downstream classification tasks.

Tech Stack

PyTorch, torchgeo, CROMA (ViT), HuggingFace Transformers, EuroSAT, Sentinel-2 (13 spectral bands)