For the next level of robot intelligence and intuitive user interaction, maps need to extend beyond geometry and appearance — they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, to fuse semantic predictions from multiple view points. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement in 2D class prediction accuracy.
[+] more
![](https://wp.doc.ic.ac.uk/bjm113/wp-content/uploads/sites/113/2017/07/combined_annotated_reconstruction-e1499685141696.png)