This is Part two of a report on a capstone project performed with Victor Gutierrez-Velez in the Remote Sensing and Sustainability Lab at Temple University. We will discuss our work expanding and customizing the model created by reachsumit in order to classify imagery of the Colombian wetlands obtained from Planet Labs.
Part One of this report provided an introductory tutorial on training and using a Deep Learning model to create a classified map using image segmentation.
Tools and Methods
This project takes advantage of a few layers of open-source tools:
Tensorflow is a Machine Learning platform which provides the tools to develop, train, and optimize Deep Learning models. Although other platforms exist, there is a very large community formed around Tensorflow, so it seems like an appropriate tool with which to get started. This layer interacts with the operating system most directly, and runs the actual model. Because training a deep learning model can be very resource intensive, software optimizations and drivers must be considered for Tensorflow.
Although Tensorflow is very powerful software, its interface is a little cumbersome. Thankfully, Keras provides a much easier API for defining, evaluating, and using models. Additionally, Keras also interfaces with two more Machine Learning platforms (CNTK and Theano), so building a workflow around Keras could help minimize the chances of being locked into one platform in the future, should the ecosystem shift.
Keras also provides a number of related utilities for building and evaluating a high-accuracy model. There are powerful preprocessing tools such as the ImageDataGenerator, which can generate batches of image data fed to the model with a number of augmentations applied. These augmentations promise to help increase a model’s accuracy by artificially increasing the amount of training data. Keras also provides the ability to track metrics, accuracy/loss, and visualize a number of important aspects related to the model.
Although Keras is compatible with R and Python, we decided to use Python, installed through an anaconda virtual environment. We had initially investigated using R for compatibility with existing lab pipelines and knowledge bases, but quickly decided that beginning our research using R would be too difficult. Keras was initially created for use with Python, so there is a much larger community of documentation and example projects than there is for using R. However, the R interface to Keras seems well developed and very functional, so it will probably be reasonable to migrate to using R once a base of in-house expertise in deep learning exists.
The operating system, QGIS, and a number of other software packages may all depend on a specific version or build of Python, and problems can quickly arise when software starts fighting over specific dependencies. Because of this, a 64 bit build of Python 3.7 and all related software was installed using the Miniconda Package Manager . Miniconda provides a comprehensive package manager, installing everything necessary for this analysis into an environment completely independent of other installed versions of Python. This allows us to upgrade and make changes to QGIS and the operating system without interfering with our deep learning pipeline. Another advantage of using Miniconda for our package management is that it is cross platform, so users running Linux, MacOS, or Windows should all be able to install the same software.
QGIS and R were both used lightly to assist in preparing the images and masks used as training data for our model.
Code and Data
There is a strong community of Deep Learning researchers developing and sharing code through sites like Medium, Github, and Kaggle. After experimenting with a few different projects, we settled on using a Convolutional Neural Network structured as a deep unet, a network model suitable for image segmentation. This model had been created by github user reachsumit  in concert with a competition on kaggle. Whereas many of the example scripts for Deep Learning only shared notable portions of the code, this was a fully functional system for creating, evaluating, and predicting from a model. It also included a set of 25 images with corresponding masks delineating land cover types, so it provided a strong foundation for building an understanding of a complex chain of events. Other potentially useful projects are listed in the Example Projects section of Appendix B.
After creating a model and classified map from the provided code and training data, we began adapting the provided code to train against high-resolution satellite images provided by a grant from planet.com. Our version of this codebase was stored and tracked on github .
Training images for this project were downloaded with minimal user intervention from Planet Labs  as part of a parallel capstone project by my associate.
Building A Pipeline
This section of our report will focus on our updated version of reachsumit’s code, which has been updated and customized to train on the higher resolution images received from Planet Labs. This code can be downloaded from the github repository for this project .
Creating a set of training data is the most important part of this process, and the most tedious. In the case of reachsumit’s unet framework, the training data provided was 24 satellite images provided as tiff files, and a corresponding set of files containing masks. Each tiff file contained a set of 5 masks for each image indicating the presence of buildings, roads, trees, crops, and water.
Creating a set of training data is one of the more time-consuming steps of this process. Two sets of images must be transformed into the correct coordinate reference system, and precisely aligned to the pixel. There are also concerns with invalid data. Pixel values such as -1, NA, and NaN can all create errors while training a model, or could prevent obtaining accurate results. As discussed in Further Research There may be opportunities to take advantage of pre-trained models or public training datasets, both which could alleviate the need to create training data from scratch.
Image Augmentation  is a powerful tool that can help to artificially increase the amount of training data fed into our model. By passing images through a series of random transformations such as rotating, flipping, cropping, and adding noise, a model trained on a relatively small number of images (under 50) can achieve accuracy above 95%.
Training the Model
Once a quality set of training data has been obtained, we can begin training the model. This is a long resource-intensive process, although it doesn’t require much interaction once started.
train_unet.py is the script that compiles and trains our model, and has a few parameters to help configure the inputs and outputs. For the training data, it needs to know the number of bands in each image and filesystem paths where the images and masks are located. The model also needs to be configured with the number of classes requested in the output map, and a set of class weights. The number of classes configured here must correspond to the number of classes in the mask files in the training dataset.
This script also has a number of parameters relating to how training data is fed to the model. Because Deep Learning relies heavily on parallel processing, each image is broken into a number of patches, the size of which in pixels is configured by patch_size. Batch_size controls how many images are loaded into the model at once. This parameter should be tuned to help ensure optimal ram use.
As mentioned previously, Image Augmentation is used to increase the amount of training data fed to the model.
gen_patches.py performs a series of rotations, mirror flips, and transpositions to help artificially increase the amount of training data available to the model. The TRAIN_SZ and VAL_SIZE parameters in
train_unet.py control this behavior. This analysis is configured to create 2000 tiles of training data and 1000 tiles for the validation set.
In the case of our customized code, we had to remove some augmentations that relied on training images being square. The original was randomly rotating image tiles in
predict.py, and the fact that our training images did not have equal length and width was not compatible with this method. The rotations were simply removed, as there are other methods of augmentation that could be added in later which are compatible with rectangular images.
In training the model, tensorflow loops through the training set a specified number of times, or epochs. This value is configurable in the N_EPOCHS parameter. For testing purposes, this value has been set to 10 in this analysis. However, to create a model with the intention of predicting against, 150 seems to be a reasonable number to start from. After training the model, this script will output a line graph visualizing how the loss and accuracy metrics have changed through each epoch. This can be used to tune it to the number of epochs with lowest loss. Running through more epochs than this is undesirable, as it can be a source of overfitting.
Training a model in Tensorflow on a recent Nvidia card  is highly recommended, as it is possible to train complex models much more quickly than on even the most modern CPU. In a data center, a Tesla card is required, but outside of that environment, a much more affordable Geforce should be similarly capable.
Cloud Services also provide environments well designed for training and using deep learning models such as Tensorflow. Amazon Web Services has a range of GPU-accelerated nodes available, which can provide easy access to preinstalled linux environments capable of being scaled up or down based on the current workload. This environment also provides easy access to relevant public datasets including Landsat, Sentinel, MODIS, and SpaceNet, a collection of commercial satellite imagery with some labeled training data. More information on this environment can be found in Public Datasets and Other Resources in Appendix B.
Lacking access to an NVIDIA GPU, Tensorflow will still train small to medium sized models in an acceptable amount of time on a modern CPU. For this research, we were using a 6-core Intel i7, and the longest training run with 150 epochs took about 48 hours to complete. Intel provides an optimized build of Tensorflow  which, according to anecdotal observations, was looping through the model approximately 35% faster than the standard build. This release was no more cumbersome to install than the main build, so it is highly recommended if an Nvidia card is not available.
Using the Model to Predict Land Cover
Once an acceptable model has been created and saved, predictions can be run against it using
predict.py. When given a satellite image not part of the training set, this script will create a classified map based on the classes defined in the training set.
This script only has a few configurable parameters. Image directory and image_id are both necessary for the script to find the image used to generate predictions. Aside from this, there is also a debug flag named x0_x1_debug, which enables additional output helpful in troubleshooting inside the predict() function.
Similar to training a network,
predict.py will generate tiles corresponding to the value of patch_size defined in
train_unet.py. The prediction process also uses augmentation to help increase the accuracy of results. In this case, it runs a prediction on the input image after performing a random augmentation as was described previously. After being done six times, the results of these predictions are averaged, which should help minimize the effects of random chance. At the end of the run, this script will create result.tif, which is the output raster containing predictions, and map.tif, a colorful map created by reclassifying result.tif based on classes and colors defined in the picture_from_mask() function.
Due to time constraints, the deep learning model customized for Planet Labs images was not accurate enough to generate usable predictions by the conclusion of this project. However, the lengthy process completes without error, so it should serve as a strong foundation for future research.
This research has provided an invaluable opportunity to explore the possibilities of Deep Learning for creating classified maps based on remote sensing imagery. As this is a new and rapidly developing field, there are a few key areas for further research that seem valuable to us.
Creating training data is one of the more cumbersome parts of this process, which could present a problem, depending on the level of automation necessary for a particular project. There are a number of publicly available training datasets, which, depending on the ecosystem in question, may serve to create an acceptably accurate model. The public datasets that came up in our research tended to focus on urban areas, and agriculture, so they did not seem to overlap tremendously with our current study area in the Colombian wetlands. These are outlined in the Public Datasets and Other Resources section of Appendix B.
Another possibility that could help alleviate the need to create training data is the availability of pre-trained models such as TernausNet . This network has been trained on the ImageNet training dataset , which consists of 1.5 Million images with training data. Because of the size of this dataset, training a model on it would be prohibitively resource intensive for most environments, so taking advantage of someone else’s training environment is generally necessary. This dataset may not have enough of a footprint in wetlands and rainforests as are found in Colombia to have a direct benefit for this project, but it could provide very useful for other ecosystems. The u-net architecture most effective for creating classified maps is relatively new, so there don’t seem to be a large variety of pre-trained networks available yet for this architecture, but it is an area to watch.
Because our model is not yet accurate enough to create predictions, increasing the accuracy is an important area for further research. Even once a model is creating predictions, improving the accuracy is an ongoing project, so there will always be further opportunities to balance accuracy and training speed. Increasing augmentation, tuning the input parameters, and altering the design of the unet model itself should all have effects on the accuracy and speed of predictions.
Academic Papers providing a solid foundation in the underlying concepts are provided in Appendix B. “U-net: Convolutional networks for biomedical image segmentation” is the original paper to describe the unet architecture in 2015. Since then, a large amount of the research on using deep learning for image segmentation has focused on detecting objects on roadways for use in autonomous vehicles. Because of this, there seems to be plenty of room to advance the state of the art for deep learning algorithms as applied to Remote Sensing.
Lastly, there is a strong community of sharing data and code around deep learning research, and publicly available training data with masks defining classes are generally very well received. Because of the steep learning curve associated with deep learning, pre-made training data is especially important for people new to the technology. When this project is in a more finished state, a public release of training data with masks could be a good way to give back to the community and generate positive publicity for the lab.
Appendix A: References
- Miniconda Package Manager https://docs.conda.io/en/latest/miniconda.html
- reachsumit’s Deep UNet https://github.com/reachsumit/deep-unet-for-satellite-image-segmentation
- Claude Schrader’s github repository for this project https://github.com/nanotubing/keras_playground
- Planet Labs https://www.planet.com
- Image Augmentation https://machinelearningmastery.com/image-augmentation-deep-learning-keras/
- Tensorflow with support for NVIDIA GPU https://www.tensorflow.org/install/gpu
- Intel Optimized build of Tensorflow: https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-architecture
- TernausNet Code: https://github.com/ternaus/TernausNet
- ImageNet: http://image-net.org
Appendix B: Other Resources
- Buscombe, Daniel, and Andrew Ritchie. “Landscape classification with deep neural networks.” Geosciences 8, no. 7 (2018): 244.
- Guidici, Daniel, and Matthew Clark. “One-Dimensional convolutional neural network land-cover classification of multi-seasonal hyperspectral imagery in the San Francisco Bay Area, California.” Remote Sensing 9, no. 6 (2017): 629.
- Iglovikov, Vladimir, and Alexey Shvets. “Ternausnet: U-net with vgg11 encoder pre-trained on imagenet for image segmentation.” arXiv preprint arXiv:1801.05746 (2018).
- Kussul, Nataliia, Mykola Lavreniuk, Sergii Skakun, and Andrii Shelestov. “Deep learning classification of land cover and crop types using remote sensing data.” IEEE Geoscience and Remote Sensing Letters 14, no. 5 (2017): 778-782.
- Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation.” In International Conference on Medical image computing and computer-assisted intervention, pp. 234-241. Springer, Cham, 2015. Harvard
- Sharma, Atharva, Xiuwen Liu, Xiaojun Yang, and Di Shi. “A patch-based convolutional neural network for remote sensing image classification.” Neural Networks 95 (2017): 19-28
- Tong, Xin-Yi, Gui-Song Xia, Qikai Lu, Huanfeng Shen, Shengyang Li, Shucheng You, and Liangpei Zhang. “Learning Transferable Deep Models for Land-Use Classification with High-Resolution Remote Sensing Images.” arXiv preprint arXiv:1807.05713 (2018).
- Kaggle Competition: Planet - Understanding the Amazon from Space:
- Kaggle Competition: Dstl Satellite Imagery Feature Detection:
- Land Use classification using Convolutional Neural Network:
- Airplane Image Classification using a Keras CNN: