BigEarthNet Distributed DL

Scalable Land Cover Classification with Distributed Deep Learning

Scalable U-Net semantic segmentation on the BigEarthNet v2 satellite imagery dataset using distributed TensorFlow training. The pipeline consists of two phases: remote data preprocessing with Petastorm and distributed training across multi-GPU setups using MirroredStrategy.

Benchmarking was conducted at 1%, 5%, and 10% dataset throughput levels. Multi-GPU training (1 to 4 GPUs) demonstrated near-linear scaling efficiency, optimizing I/O and compute trade-offs for cloud-based geospatial workflows.

Tech Stack

TensorFlow, Petastorm, U-Net, AWS S3, MirroredStrategy (multi-GPU), BigEarthNet v2