In this post I will show how you obtain a ready-to-use training dataset with Sentinel-2 satellite images, train a CNN with TensorFlow and Keras and create a Land use and land cover map for new and unseen data.

Classification mask

TensorFlow Datasets is a collection of tf.data.Datasets that can be used without the hassle of manually downloading and preparing training data. EuroSat is a dataset of 27,000 labeled and geo-referenced images in 10 classes with 13 spectral bands. It is available as TensorFlow Dataset definition.

Loading the dataset

Here we use the all dataset configuration that contains all 13 image bands and not just RGB. We use 80% of the dataset for training, retain 10% as validation and another 10% for the final classifier assessment as test set.

import tensorflow as tf
import tensorflow_datasets as tfds

(ds_train, ds_valid, ds_test), ds_info = tfds.load('eurosat/all',
                                                   split=['train[:80%]','train[80%:90%]','train[90%:]'],
                                                   as_supervised=True,
                                                   with_info=True)
NUM_CLASSES = ds_info.features['label'].num_classes
CLASS_NAMES = ds_info.features['label'].names
print('number of samples in train, valid, test:',
      [tf.data.experimental.cardinality(ds).numpy() for ds in [ds_train,ds_valid,ds_test]])
print('class names:', CLASS_NAMES)

# > number of samples in train, valid, test: [21600, 2700, 2700]
# > class names: ['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial', 'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']

Inspecting the dataset

In the next step we want to visualize some samples as RGB images. The image bands are available as UInt16 “Digital Numbers” (DN) that need to be converted to reflectances by dividing them by 10,000. The reflectance has a typical range of [0.0, 0.4] but as the documentation states, higher values are expected in infrared bands. To normalize the reflectance values we simply multiply them by 2.5 and clip them to the [0.0, 1.0] range to properly display them:

import matplotlib.pyplot as plt
import numpy as np

DN = 10_000

def normalize(tensor):
    tensor /= DN
    tensor *= 2.5
    return tf.clip_by_value(tensor, 0., 1.)

def ms_to_rgb(tensor):
    band_r = tensor[..., 3]
    band_g = tensor[..., 2]
    band_b = tensor[..., 1]
    return normalize(tf.stack([band_r, band_g, band_b], axis=-1))

def show_rgb_samples(ds):
    number_samples = tf.data.experimental.cardinality(ds).numpy()
    fig, axs = plt.subplots(1, number_samples, figsize=(number_samples * 5, 5), squeeze=False)
    for index, (tensor, label) in enumerate(ds):
        if tf.rank(label) == 0:
            title = CLASS_NAMES[label.numpy()]

        else: #label is a tensor with probabilities
            title = f'{CLASS_NAMES[tf.math.argmax(label)]} ({tf.math.reduce_max(label):.3f})'

        axs[0, index].imshow(ms_to_rgb(tensor))
        axs[0, index].set_title(title)
        axs[0, index].axis('off')

show_rgb_samples(ds_train.take(5))

EuroSat RGB sample images

Training

For training we use a custom function to standardize each sample. We calculated the mean and standard deviation for each band in the training set. Each sample in the train, validation and test datasets gets standardized by subtracting the mean and dividing the standard deviation per band. We also want to augment our train dataset and randomly flip and rotate each sample.

import tensorflow_addons as tfa

MEANS = [1353.4839, 1117.0938, 1042.1034, 946.73224, 1199.5142, 2005.943, 2378.6394, 2305.6174, 732.0749, 12.07309,
         1822.308, 1118.718, 2605.0498]
STDDEVS = [65.17162, 153.58511, 187.4783, 277.7896, 227.93288, 356.2915, 455.57788, 531.0652, 98.77772, 1.1824766,
           378.07095, 302.88586, 502.6844]

@tf.function
def standardize(tensor, *args):
    band_tensors=[]
    for c in range(tensor.shape[-1]):
        t = tf.cast(tensor[..., c], tf.float32)
        t -= MEANS[c]
        t /= STDDEVS[c]
        band_tensors.append(t)
    tensor= tf.stack(band_tensors, -1)
    return (tensor, tf.one_hot(args[0], NUM_CLASSES)) if len(args) == 1 else tensor

@tf.function
def augment(tensor, label):
    angle = tf.random.uniform(shape=[], minval=-180, maxval=180, dtype=tf.float32)
    tensor = tfa.image.rotate(tensor, angle)
    tensor = tf.image.random_flip_left_right(tensor)
    return tensor, label

BATCH_SIZE = 32
BUFFER_SIZE = tf.data.experimental.cardinality(ds_train).numpy()

ds_train = ds_train\
    .map(standardize, num_parallel_calls=tf.data.experimental.AUTOTUNE)\
    .cache()\
    .map(augment, num_parallel_calls=tf.data.experimental.AUTOTUNE)\
    .shuffle(BUFFER_SIZE)\
    .batch(BATCH_SIZE)\
    .prefetch(tf.data.experimental.AUTOTUNE)

ds_valid = ds_valid\
    .map(standardize, num_parallel_calls=tf.data.experimental.AUTOTUNE)\
    .cache()\
    .batch(BATCH_SIZE)\
    .prefetch(tf.data.experimental.AUTOTUNE)

ds_test = ds_test\
    .map(standardize, num_parallel_calls=tf.data.experimental.AUTOTUNE)\
    .batch(BATCH_SIZE)

We train a DenseNet-121 model from scratch with an initial learning rate of 1e-4 and a batch size of 32 for 100 epochs. We also apply an early stopping callback that stops the training if the validation loss hasn’t improved for seven epochs and a reduce learning rate callback that reduces the metric if the validation loss hasn’t improved for four epochs. Once the training is finished we evaluate our model on the test dataset.

import os

out_dir='out'
METRICS = [tf.keras.metrics.CategoricalAccuracy(name='accuracy'),
           tf.keras.metrics.Precision(),
           tf.keras.metrics.Recall()]

model = tf.keras.applications.densenet.DenseNet121(include_top=True, weights=None, input_tensor=None, input_shape=(64,64,13), pooling=None, classes=NUM_CLASSES)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4), loss='categorical_crossentropy', metrics=METRICS)

early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=7, restore_best_weights=True)
reduce_lr_callback = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=4)
callbacks = [early_stopping_callback, reduce_lr_callback]

model.fit(ds_train,
          epochs=100,
          validation_data=ds_valid,
          callbacks=callbacks)
model.save(os.path.join(out_dir, 'model'))
model.evaluate(ds_test)

# > [...]
# > Epoch 46/100
# > 675/675 [==============================] - 60s 64ms/step - loss: 0.0179 - accuracy: 0.9937 - precision: 0.9944 - recall: 0.9935 - val_loss: 0.0562 - val_accuracy: 0.9837 - val_precision: 0.9837 - val_recall: 0.9826
# > INFO:tensorflow:Assets written to: out/model/assets
# > 85/85 [==============================] - 4s 39ms/step - loss: 0.0567 - accuracy: 0.9841 - precision: 0.9855 - recall: 0.9830

In the above run, the model training was stopped after 46 epochs and we achieved a top-1 accuracy of 0.9841 on the unseen test set which is on par with the numbers from the EuroSat paper.

Classification of new samples

In the next step we want to classify a completely new scene. You can download new images either on the Copernicus Open Access Hub or the Sentinel Hub EO Browser. In both cases you need to register first. I recommend using the Level-1C (top-of-atmosphere) products as the Level-2A (bottom-of-atmosphere) products are preprocessed and missing some redundant bands.

Important to note is the band order the EuroSat dataset was created with. The last band is 8A, not 12:

Band index in dataset	Sentinel-2 band	Resolution (m)	Central wavelength (nm)
0	Band 1 - Coastal aerosol	60	443
1	Band 2 - Blue	10	490
2	Band 3 - Green	10	560
3	Band 4 - Red	10	665
4	Band 5 - Vegetation red edge 1	20	705
5	Band 6 - Vegetation red edge 2	20	740
6	Band 7 - Vegetation red edge 3	20	783
7	Band 8 - NIR	10	842
8	Band 9 - Water vapour	60	945
9	Band 10 - SWIR - Cirrus	60	1375
10	Band 11 - SWIR 1	20	1610
11	Band 12 - SWIR 2	20	2190
12	Band 8A - Narrow NIR	20	865

When you download the Level-1C data from the sources above you get a ZIP archive that contains all image bands as separate jp2 files. The next step is to create a single image tensor of shape (img_height, img_width, 13) from the individual image files and upsample the bands with a resolution of 20m or 60m to 10m:

import cv2
from tqdm import tqdm

def jp2_to_tensor(path_template):
    band_resolution = [('B01', 60), ('B02', 10), ('B03', 10), ('B04', 10), ('B05', 20), ('B06', 20), ('B07', 20),
                       ('B08', 10), ('B09', 60), ('B10', 60), ('B11', 20), ('B12', 20), ('B8A', 20),]

    band_images = []
    for band, resolution in tqdm(band_resolution):
        path = path_template.format(band)
        band_image = cv2.imread(path, cv2.IMREAD_UNCHANGED)

        factor = resolution/10.
        band_image = cv2.resize(band_image, (0,0), fx=factor, fy=factor, interpolation=cv2.INTER_NEAREST)
        band_images.append(tf.convert_to_tensor(band_image))
    return tf.stack(band_images, axis=-1)

sample = jp2_to_tensor('./S2A_MSIL1C_20211203T095401_N0301_R079_T33UXQ_20211203T105856.SAFE/GRANULE/L1C_T33UXQ_A033681_20211203T095357/IMG_DATA/T33UXQ_20211203T095401_{}.jp2')
print(sample.shape)

# > 100%|██████████| 13/13 [01:44<00:00,  8.08s/it]
# > (10980, 10980, 13)

As the model was trained on 64 x 64 px image patches, we want to predict patches of the same dimensions from that sample and output the probability for the top-1 class:

PATCH_HW = 64
sample_offsets=((6208, 1409), (9576, 7376), (4871, 1580), (4981, 10360), (4794, 4500), (7432, 7950))
sample_tensors = [sample[row:row + PATCH_HW, column:column + PATCH_HW, :] for row, column in sample_offsets]
sample_tensors = tf.stack(sample_tensors, axis=0)

raw_predictions = model.predict(standardize(sample_tensors))
ds_prediction = tf.data.Dataset.from_tensor_slices((sample_tensors, raw_predictions))
show_rgb_samples(ds_prediction)

64x64px patches predicted by our trained model

In the next step we want to predict a larger scene from the original sample image and plot a classification mask on top of it.

h, w = 200, 400
offset_h, offset_w = 5278, 1514

scene_raw = sample[offset_h:offset_h+h, offset_w:offset_w+w, :]
scene_height, scene_width, scene_num_bands = scene_raw.shape

plt.figure(figsize=(10,5))
plt.imshow(ms_to_rgb(scene_raw))
plt.axis('off')
plt.show()

scene = standardize(scene_raw)
scene = tf.expand_dims(scene, 0)

Random scene cropped from the original sample image

From the scene above we create a dataset of 64x64 patches with the help of tf.image.extract_patches. This collects patches from the image as if applying the convolution operation. The SAME padding flag creates a padding of zeroes around a patch for areas outside of the input. With that approach we classify each pixel of the scene.

patches = tf.image.extract_patches(scene, [1, PATCH_HW, PATCH_HW, 1], [1,1,1,1], [1,1,1,1], 'SAME')
patches = tf.reshape(patches, [patches.shape[-2] * patches.shape[-3], PATCH_HW, PATCH_HW, scene_num_bands])
patches_dataset = tf.data.Dataset.from_tensor_slices(patches)
preds = model.predict(patches_dataset.batch(32), verbose=1)
preds_top_1 = np.argmax(preds, axis=-1)
preds_top_1 = np.reshape(preds_top_1, (scene_height, scene_width))

# > 2500/2500 [==============================] - 738s 263ms/step

Finally, we want to display our classification mask on top of our scene image. This is possible with the help of some Matplotlib magic. The result you can see at the top of this post.

from matplotlib.colors import ListedColormap, BoundaryNorm
from matplotlib.ticker import FuncFormatter

color_dict = {
    0: 'tab:olive', #AnnualCrop
    1: 'tab:green', #Forest
    2: 'tab:orange', #HerbaceousVegetation
    3: 'tab:gray', #Highway
    4: 'tab:purple', #Industrial
    5: 'tab:brown', #Pasture
    6: 'tab:pink', #PermanentCrop
    7: 'tab:red', #Residential
    8: 'tab:blue', #River
    9: 'tab:cyan', #SeaLake
}

cm = matplotlib.colors.ListedColormap([color_dict[x] for x in color_dict.keys()])

labels = np.array(CLASS_NAMES)
len_lab = len(labels)

norm_bins = np.sort([*color_dict.keys()]) + 0.5
norm_bins = np.insert(norm_bins, 0, np.min(norm_bins) - 1.0)

norm = BoundaryNorm(norm_bins, len_lab, clip=True)
fmt = FuncFormatter(lambda x, pos: labels[norm(x)])

fig,ax = plt.subplots(figsize=(30,10))
ax.axis('off')
ax.imshow(ms_to_rgb(scene_raw))
im = ax.imshow(preds_top_1, alpha=0.6, cmap=cm, norm=norm)

diff = norm_bins[1:] - norm_bins[:-1]
tickz = norm_bins[:-1] + diff / 2
cb = fig.colorbar(im, format=fmt, ticks=tickz)