TY - GEN
T1 - Building and road detection from large aerial imagery
AU - Saito, Shunta
AU - Aoki, Yoshimitsu
PY - 2015/1/1
Y1 - 2015/1/1
N2 - Building and road detection from aerial imagery has many applications in a wide range of areas including urban design, real-estate management, and disaster relief. The extracting buildings and roads from aerial imagery has been performed by human experts manually, so that it has been very costly and time-consuming process. Our goal is to develop a system for automatically detecting buildings and roads directly from aerial imagery. Many attempts at automatic aerial imagery interpretation have been proposed in remote sensing literature, but much of early works use local features to classify each pixel or segment to an object label, so that these kind of approach needs some prior knowledge on object appearance or class-conditional distribution of pixel values. Furthermore, some works also need a segmentation step as pre-processing. Therefore, we use Convolutional Neural Networks(CNN) to learn mapping from raw pixel values in aerial imagery to three object labels (buildings, roads, and others), in other words, we generate three-channel maps from raw aerial imagery input. We take a patch-based semantic segmentation approach, so we firstly divide large aerial imagery into small patches and then train the CNN with those patches and corresponding three-channel map patches. Finally, we evaluate our system on a large-scale road and building detection datasets that is publicly available.
AB - Building and road detection from aerial imagery has many applications in a wide range of areas including urban design, real-estate management, and disaster relief. The extracting buildings and roads from aerial imagery has been performed by human experts manually, so that it has been very costly and time-consuming process. Our goal is to develop a system for automatically detecting buildings and roads directly from aerial imagery. Many attempts at automatic aerial imagery interpretation have been proposed in remote sensing literature, but much of early works use local features to classify each pixel or segment to an object label, so that these kind of approach needs some prior knowledge on object appearance or class-conditional distribution of pixel values. Furthermore, some works also need a segmentation step as pre-processing. Therefore, we use Convolutional Neural Networks(CNN) to learn mapping from raw pixel values in aerial imagery to three object labels (buildings, roads, and others), in other words, we generate three-channel maps from raw aerial imagery input. We take a patch-based semantic segmentation approach, so we firstly divide large aerial imagery into small patches and then train the CNN with those patches and corresponding three-channel map patches. Finally, we evaluate our system on a large-scale road and building detection datasets that is publicly available.
KW - Aerial imagery
KW - Building detection
KW - Convolutional neural networks
KW - Road detection
KW - Semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=84926617496&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84926617496&partnerID=8YFLogxK
U2 - 10.1117/12.2083273
DO - 10.1117/12.2083273
M3 - Conference contribution
AN - SCOPUS:84926617496
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Proceedings of SPIE-IS and T Electronic Imaging - Image Processing
A2 - Niel, Kurt S.
A2 - Lam, Edmund Y.
PB - SPIE
T2 - Image Processing: Machine Vision Applications VIII
Y2 - 10 February 2015 through 11 February 2015
ER -