TY - GEN
T1 - Learning an optimisable semantic segmentation map with image conditioned variational autoencoder
AU - Zhuang, Pengcheng
AU - Sekikawa, Yusuke
AU - Hara, Kosuke
AU - Saito, Hideo
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - Recent semantic segmentation systems have achieved significant improvement by performing pixel-wise training with hierarchical features using deep convolutional neural network models. While the learning process usually requires pixel-level annotated images, it is difficult to get desirable amounts of fine-labeled data and thus the training set size is more likely to be limited, often in thousands. This means that top methods for a dataset can be fine-tuned for a specific situation, making the generalization ability unclear. In real-world applications like self-driving systems, ambiguous region or lack of context information can cause errors in the predicted results. Resolving such ambiguities is crucial for subsequent operations to be performed safely. We are inspired by work from CodeSLAM where optimizable pixel-wise depth representation is learned. We modify the regression method to work on the pixel-wise classification problem. By training a variational auto-encoder network conditioned with a color image, the computed latent space works as a low-dimensional representation of semantic segmentation, which can be efficiently optimized. As a consequence, our model can correct the error or ambiguity of the prediction during the inference phase given useful scene information. We show how this approach works by giving partial scene truth and perform optimization on the latent variable.
AB - Recent semantic segmentation systems have achieved significant improvement by performing pixel-wise training with hierarchical features using deep convolutional neural network models. While the learning process usually requires pixel-level annotated images, it is difficult to get desirable amounts of fine-labeled data and thus the training set size is more likely to be limited, often in thousands. This means that top methods for a dataset can be fine-tuned for a specific situation, making the generalization ability unclear. In real-world applications like self-driving systems, ambiguous region or lack of context information can cause errors in the predicted results. Resolving such ambiguities is crucial for subsequent operations to be performed safely. We are inspired by work from CodeSLAM where optimizable pixel-wise depth representation is learned. We modify the regression method to work on the pixel-wise classification problem. By training a variational auto-encoder network conditioned with a color image, the computed latent space works as a low-dimensional representation of semantic segmentation, which can be efficiently optimized. As a consequence, our model can correct the error or ambiguity of the prediction during the inference phase given useful scene information. We show how this approach works by giving partial scene truth and perform optimization on the latent variable.
KW - Optimization
KW - Semantic segmentation
KW - Variational autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85072901121&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072901121&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-30645-8_35
DO - 10.1007/978-3-030-30645-8_35
M3 - Conference contribution
AN - SCOPUS:85072901121
SN - 9783030306441
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 379
EP - 389
BT - Image Analysis and Processing – ICIAP 2019 - 20th International Conference, Proceedings
A2 - Ricci, Elisa
A2 - Sebe, Nicu
A2 - Rota Bulò, Samuel
A2 - Snoek, Cees
A2 - Lanz, Oswald
A2 - Messelodi, Stefano
PB - Springer Verlag
T2 - 20th International Conference on Image Analysis and Processing, ICIAP 2019
Y2 - 9 September 2019 through 13 September 2019
ER -