Sharing data and computing in the cloud allows data users to focus on data analysis rather than data access. Open Data on AWS helps you discover and share public open datasets in the cloud. The Registry of Open Data on AWS hosts a large amount of public open data. The datasets range from genomics to climate to transportation information. They are well structured and easily accessible. Additionally, you can use these datasets in machine learning (ML) model development in the cloud.
In this post, we demonstrate how to extract buildings and roads from two large-scale geospatial datasets: SpaceNet satellite images and USGS 3DEP LiDAR data. Both datasets are hosted on the Registry of Open Data on AWS. We show you how to launch an Amazon SageMaker notebook instance and walk you through the tutorial notebooks at a high level. The notebooks reproduce winning algorithms from the SpaceNet challenges (which only use satellite images). In addition to the SpaceNet satellite images, we compare and combine the USGS 3D Elevation Program (3DEP) LiDAR data to extract the same.
This post demonstrates running ML services on AWS to extract features from large-scale geospatial data in the cloud. By following our examples, you can train the ML models on AWS, apply the models to other regions where satellite or LiDAR data is available, and experiment with new ideas to improve the performances. For the complete code and notebooks of this tutorial, see our GitHub repo.
In this section, we provide more detail about the datasets we use in this post.
SpaceNet launched in August 2016 as an open innovation project offering a repository of freely available imagery with co-registered map features. It’s a large corpus of labeled satellite imagery. The project has also launched a series of competitions ranging from automatic building extraction, road extraction, to recently published multi-temporal urban development analysis. The dataset covers 11 areas of interest (AOIs), including Rio de Janeiro, Las Vegas, and Paris. For this post, we use Las Vegas; the images in this AOI cover 216km2 areas with 151,367 building polygon labels and 3,685km road labels.
The following image is from DigitalGlobe’s SpaceNet Challenge Concludes First Round, Moves to Higher Resolution Challenges.
USGS 3DEP LiDAR dataset
Our second dataset comes from the USGS 3D Elevation Program (3DEP) in the form of LiDAR (Light Detection and Ranging) data. The program’s goal is to complete the acquisition of nationwide LiDAR to provide the first-ever national baseline of consistent high-resolution topographic elevation data, collected in a timeframe of less than a decade. LiDAR is a remote sensing method that emits hundreds of thousands of near-infrared light pulses each second
Source - Continue Reading: https://aws.amazon.com/blogs/machine-learning/extracting-buildings-and-roads-from-aws-open-data-using-amazon-sagemaker/