Open Access
Advanced Visual SLAM and Image Segmentation Techniques for Augmented Reality
1
Cranfield University
2
Cranfield University
3
Cranfield University
Abstract
Augmented reality can enhance human perception to experience a virtual-reality intertwined world by computer vision techniques. However, the basic techniques cannot handle complex large-scale scenes, tackle real-time occlusion, and render virtual objects in augmented reality. Therefore, this paper studies potential solutions, such as visual SLAM and image segmentation, that can address these challenges in the augmented reality visualizations. This paper provides a review of advanced visual SLAM and image segmentation techniques for augmented reality. In addition, applications of machine learning techniques for improving augmented reality are presented.
Keywords
Augmented reality,computer vision,image segmentation,machine learning,visual SLAM
How to Cite
Jiang, Y., Tran, T. H., & Williams, L. (2026). Advanced Visual SLAM and Image Segmentation Techniques for Augmented Reality. Asia Journal of Social Innovation and Development, 2(1), 28. Retrieved from https://www.ajsid.org/index.php/pub/article/view/28
References
📄
AlapattD.MascagniP.VardazaryanA.GarciaA.OkamotoN.MutterD.MarescauxJ.CostamagnaG.DallemagneB. PadoyN. (2021). Temporally Constrained Neural Networks (TCNN): A framework for semi-supervised video semantic segmentation.
📄
Almalioglu, Y., Saputra, M. R. U., de Gusmao, P. P., Markham, A., & Trigoni, N. (2019). Ganvo: Unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In International conference on robotics and automation, (pp. 5474-5480). IEEE.
📄
Alves, J., & Bernardino, A. (2020, April). A remote RGB-D VSLAM solution for low computational powered robots. In 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), (pp. 214-220). IEEE. doi:10.1109/ICARSC49921.2020.9096074
📄
Alzahrani, N. M., & Alfouzan, F. A. (2022). Augmented Reality (AR) and Cyber-Security for Smart Cities—A Systematic Literature Review. Sensors (Basel), 22(7), 2792. doi:10.3390/s22072792 PMID:35408406
📄
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 5297-5307). doi:10.1109/CVPR.2016.572
📄
Arshad, S., & Kim, G. W. (2021). Role of deep learning in loop closure detection for visual and lidar SLAM: A survey. Sensors (Basel), 21(4), 1243. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=33578695&dopt=Abstract doi:10.3390/s21041243 PMID:33578695
📄
Azuma, R. T. (1997). A survey of augmented reality. Presence (Cambridge, Mass.), 6(4), 355–385. doi:10.1162/ pres.1997.6.4.355
📄
Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=28060704&dopt=Abstract doi:10.1109/TPAMI.2016.2644615 PMID:28060704
📄
Besl, P. J., & McKay, N. D. (1992). Method for registration of 3-D shapes. In Sensor fusion IV: control paradigms and data structures (Vol. 1611, pp. 586–606). International Society for Optics and Photonics.doi:10.1117/12.57955
📄
Bian, X., Lim, S. N., & Zhou, N. (2016). Multiscale fully convolutional network with application to industrial inspection. In 2016 IEEE winter conference on applications of computer vision (WACV). IEEE.
📄
Billinghurst, M., Clark, A., & Lee, G. (2015). A survey of augmented reality.
📄
Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2020). Yolact++: Better real-time instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. PMID:32755851
📄
Braud, T., Bijarbooneh, F. H., Chatzopoulos, D., & Hui, P. (2017). Future networking challenges: The case of mobile augmented reality.In International Conference on Distributed Computing Systems,(pp. 1796-1807).IEEE.
📄
Caruso, D., Engel,J., & Cremers, D. (2015). Large-scale directslam for omnidirectional cameras. In 2015 IEEE/ RSJ International Conference on Intelligent Robots and Systems (IROS), (pp. 141-148). IEEE. doi:10.1109/ IROS.2015.7353366
📄
Castle, R., Klein, G., & Murray, D. W. (2008). Video-rate localization in multiple maps for wearable augmented reality.In International Symposium on Wearable Computers,(pp. 15-22).IEEE. doi:10.1109/ISWC.2008.4911577
📄
Castle, R. O., Gawley, D. J., Klein, G., & Murray, D. W. (2007). Towards simultaneous recognition, localization and mapping for hand-held and wearable cameras. In Proceedings 2007 IEEE International Conference on Robotics and Automation, (pp. 4102-4107). IEEE. doi:10.1109/ROBOT.2007.364109
📄
ChenL. C.PapandreouG.KokkinosI.MurphyK.YuilleA. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs.
📄
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=28463186&dopt=Abstract doi:10.1109/TPAMI.2017.2699184 PMID:28463186
📄
Chen, S., Wu, J., Lu, Q., Wang, Y., & Lin, Z. (2021). Cross-scene loop-closure detection with continual learning for visual simultaneous localization and mapping. International Journal of Advanced Robotic Systems, 18(5),17298814211050560. doi:10.1177/17298814211050560
📄
ChenZ.LamO.JacobsonA.MilfordM. (2014). Convolutional neural network-based place recognition.
📄
Cheng, J., Sun, Y., & Meng, M. Q. H. (2019). Improving monocular visual SLAM in dynamic environments:An optical-flow-based approach. Advanced Robotics, 33(12), 576–589. doi:10.1080/01691864.2019.1610060
📄
Civera, J., Gálvez-López, D., Riazuelo, L., Tardós, J. D., & Montiel, J. M. M. (2011). Towards semantic SLAM using a monocular camera. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, (pp.1277-1284). IEEE. doi:10.1109/IROS.2011.6094648
📄
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 3213-3223). doi:10.1109/CVPR.2016.350
📄
Costa, G. D. M., Petry, M. R., & Moreira, A. P. (2022). Augmented Reality for Human–Robot Collaboration and Cooperation in Industrial Applications: A Systematic Literature Review. Sensors (Basel), 22(7), 2725. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=35408339&dopt=Abstract doi:10.3390/s22072725 PMID:35408339
📄
Costante, G., Mancini, M., Valigi, P., & Ciarfuglia, T. A. (2015). Exploring representation learning with cnns for frame-to-frame ego-motion estimation. IEEE Robotics and Automation Letters, 1(1), 18–25. doi:10.1109/LRA.2015.2505717
📄
Criminisi, A., Cross, G., Blake, A., & Kolmogorov, V. (2006). Bilayer segmentation of live video. In Computer Society Conference on Computer Vision and Pattern Recognition, 1, (pp. 53-60). IEEE.
📄
Dai, W., Zhang, Y., Li, P., Fang, Z., & Scherer, S. (2020). Rgb-d slam in dynamic environments using point correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1), 373–389. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=32750826&dopt=Abstract doi:10.1109/TPAMI.2020.3010942 PMID:32750826
📄
Davison, A. J. (2003). Real-time simultaneous localisation and mapping with a single camera. In Computer Vision, 3, (pp. 1403-1403). IEEE Computer Society. doi:10.1109/ICCV.2003.1238654
📄
Davison, A. J., Reid, I. D., Molton, N. D., & Stasse, O. (2007). MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1052–1067. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=17431302&dopt=Abstract doi:10.1109/TPAMI.2007.1049 PMID:17431302
📄
Dey, A., Billinghurst, M., Lindeman, R. W., & Swan,J. II. (2018). A systematic review of 10 years of augmented reality usability studies: 2005 to 2014. Frontiers in Robotics and AI, 5, 37. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=33500923&dopt=Abstract doi:10.3389/frobt.2018.00037 PMID:33500923
📄
Dou, M., Khamis, S., Degtyarev, Y., Davidson, P., Fanello, S. R., Kowdle, A., & Izadi, S. (2016). Fusion4d: Real-time performance capture of challenging scenes. ACM Transactions on Graphics, 35(4), 1–13.doi:10.1145/2897824.2925969
📄
Du, C., Chen, Y. L., Ye, M., & Ren, L. (2016). Edge snapping-based depth enhancement for dynamic occlusion handling in augmented reality. In 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), (pp. 54-62). IEEE. doi:10.1109/ISMAR.2016.17
📄
Duan, C.,Junginger, S., Huang,J.,Jin, K., & Thurow, K. (2019). Deep learning for visual SLAM in transportation robotics: A review. Transportation Safety and Environment, 1(3), 177–184. doi:10.1093/tse/tdz019
📄
Eade, E., & Drummond, T. (2006). Edge Landmarks in Monocular SLAM. In BMVC, (pp. 7-16). doi:10.5244/C.20.2
📄
Egger,J., & Masood, T. (2020). Augmented reality in support of intelligent manufacturing–a systematic literature review. Computers & Industrial Engineering, 140, 106195. doi:10.1016/j.cie.2019.106195
📄
Engel, J., Koltun, V., & Cremers, D. (2017). Directsparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 611–625. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=28422651&dopt=Abstract doi:10.1109/TPAMI.2017.2658577 PMID:28422651
📄
Engel, J., Schöps, T., & Cremers, D. (2014). LSD-SLAM: Large-scale direct monocular SLAM. In European conference on computer vision (pp. 834-849). Springer, Cham.
📄
Engel, J., Stückler, J., & Cremers, D. (2015). Large-scale direct SLAM with stereo cameras. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1935-1942). IEEE. doi:10.1109/IROS.2015.7353631
📄
Engel, J., Sturm, J., & Cremers, D. (2013). Semi-dense visual odometry for a monocular camera. In Proceedings of the IEEE international conference on computer vision (pp. 1449-1456). doi:10.1109/ICCV.2013.183
📄
Ess, A., Müller, T., Grabner, H., & Van Gool, L. (2009; Vol. 1). Segmentation-Based Urban Traffic Scene Understanding. In BMVC.
📄
Forster, C., Lynen, S., Kneip, L., & Scaramuzza, D. (2013). Collaborative monocular slam with multiple microaerial vehicles. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 3962-3970).IEEE.doi:10.1109/IROS.2013.6696923
📄
Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). SVO: Fast semi-direct monocular visual odometry. In 2014 IEEE international conference on robotics and automation (ICRA). IEEE.
📄
Fukiage, T., Oishi, T., & Ikeuchi, K. (2014). Visibility-based blending for real-time applications. In 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 63-72). IEEE. doi:10.1109/ISMAR.2014.6948410
📄
Gao, G., Xu, G., Yu, Y., Xie, J., Yang, J., & Yue, D. (2021). MSCFNet: A lightweight network with multi-scale context fusion for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 1–11.doi:10.1109/TITS.2021.3098355
📄
Gao, X., & Zhang, T. (2015). Loop closure detection for visual slam systems using deep neural networks. In 2015 34th Chinese Control Conference (CCC) (pp. 5851-5856). IEEE. doi:10.1109/ChiCC.2015.7260555
📄
Gao, X., & Zhang, T. (2017). Unsupervised learning to detect loops using deep neural networksfor visual SLAM system. Autonomous Robots, 41(1), 1–18. doi:10.1007/s10514-015-9516-2
📄
Gattullo, M., Evangelista, A., Uva, A. E., Fiorentino, M., & Gabbard,J. L. (2020). What, how, and why are visual assets used in industrial augmented reality? A systematic review and classification in maintenance, assembly, and training (from 1997 to 2019). IEEE Transactions on Visualization and Computer Graphics, 28(2), 1443–1456.doi:10.1109/TVCG.2020.3014614 PMID:32759085
📄
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition. IEEE.
📄
Geng, J. (2011). Structured-light 3D surface imaging: A tutorial. Advances in Optics and Photonics, 3(2),128–160. doi:10.1364/AOP.3.000128
📄
Goh, E. S., Sunar, M. S., & Ismail, A. W. (2019). 3D object manipulation techniques in handheld mobile augmented reality interface: A review. IEEE Access: Practical Innovations, Open Solutions, 7, 40581–40601.doi:10.1109/ACCESS.2019.2906394
📄
Handa, A., Bloesch, M., Pătrăucean, V., Stent, S., McCormac, J., & Davison, A. (2016). gvnn: Neural network library for geometric computer vision. In European Conference on Computer Vision (pp. 67-82). Springer,Cham.doi:10.1007/978-3-319-49409-8_9
📄
Hariharan, B., Arbeláez, P., Girshick, R., & Malik, J. (2014). Simultaneous detection and segmentation. In European conference on computer vision (pp. 297-312). Springer, Cham.
📄
Hasinoff, S. W., Kang, S. B., & Szeliski, R. (2006). Boundary matting for view synthesis. Computer Vision and Image Understanding, 103(1), 22–32. doi:10.1016/j.cviu.2006.02.005
📄
He, H., Yuan, Y., Yue, X., & Hu, H. (2022). MLSeg: Image and video segmentation as multi-label classification and selected-label pixel classification. arXiv preprint arXiv:2203.04187.
📄
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
📄
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
📄
Hebborn, A. K., Höhner, N., & Müller, S. (2017). Occlusion matting: realistic occlusion handling for augmented reality applications. In 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp.62-71). IEEE.doi:10.1109/ISMAR.2017.23
📄
Hentout, A., Maoudj, A., Kaid-Youcef, N., Hebib, D., & Bouzouia, B. (2020). Distributed multi-agent biddingbased approach for the collaborative mapping of unknown indoor environments by a homogeneous mobile robot team. Journal of Intelligent Systems, 29(1), 84–99. doi:10.1515/jisys-2017-0255
📄
Hirose, K., & Saito, H. (2012). Fast line description for line-based slam. In 2012 23rd British Machine Vision Conference, BMVC 2012. doi:10.5244/C.26.83
📄
Hoff, W. A., Nguyen, K., & Lyon, T. (1996). Computer-vision-based registration techniques for augmented reality. In Intelligent Robots and Computer Vision XV: Algorithms, Techniques, Active Vision, and Materials Handling (Vol. 2904, pp. 538-548). International Society for Optics and Photonics.
📄
Hou,J., Dai, A., & Nießner, M. (2019). 3d-sis: 3d semantic instance segmentation of rgb-d scans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4421-4430). doi:10.1109/CVPR.2019.00455
📄
Hou, Y., Zhang, H., & Zhou, S. (2015). Convolutional neural network-based image representation for visual loop closure detection. In 2015 IEEE international conference on information and automation. IEEE.
📄
Hu, P., Heilbron, F., Wang, O., Lin, Z., Sclaroff, S., & Perazzi, F. (2020). Temporally distributed networks for fast video semantic segmentation. arXiv preprint arXiv:2004.01800. 10.1109/CVPR42600.2020.00884
📄
Huang, J., & You, S. (2016) Point Cloud Labeling using 3D Convolutional Neural Network. In 2016 22th International Conference on Pattern Recognition (ICPR). IEEE.
📄
Huang, P., Zeng, L., Luo, K., Guo, J., Zhou, Z., & Chen, X. (2021, July). ColaSLAM: Real-Time Multi-Robot Collaborative Laser SLAM via EdgeComputing.In 2021 IEEE/CIC International Conference on Communications in China (ICCC)(pp. 242-247). IEEE. doi:10.1109/ICCC52777.2021.9580413
📄
Huang, Z., Hui, P., Peylo, C., & Chatzopoulos, D. (2013). Mobile augmented reality survey: a bottom-up approach. arXiv preprint arXiv:1309.4413.
📄
Hui, J., & Zhang, H. (2022). A semantic segmentation network based on multi-branch structures and multiscale modules.
📄
Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., & Stamminger, M. (2016). Volumedeform: Real-time volumetric non-rigid reconstruction. In European Conference on Computer Vision (pp. 362-379). Springer, Cham.
📄
Irshad, S., & Rambli, D. R. A. (2017, November). Advances in mobile augmented reality from user experience perspective: a review of studies. In International Visual Informatics Conference (pp. 466-477). Springer, Cham.doi:10.1007/978-3-319-70010-6_43
📄
Irshad, S., & Rambli, D. R. B. A. (2014, September). User experience of mobile augmented reality: A review of studies. In 2014 3rd international conference on user science and engineering (i-USEr) (pp. 125-130). IEEE.doi:10.1109/IUSER.2014.7002689
📄
Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. Advances in Neural Information Processing Systems, 28, 2017–2025.
📄
Jang, Y., Oh, C., Lee, Y., & Kim, H. J. (2021). Multirobot collaborative monocular SLAM utilizing rendezvous.IEEE Transactions on Robotics, 37(5), 1469–1486. doi:10.1109/TRO.2021.3058502
📄
Ji, J., Shi, R., Li, S., Chen, P., & Miao, Q. (2020). Encoder-decoder with cascaded CRFs for semantic segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 31(5), 1926–1938. doi:10.1109/TCSVT.2020.3015866
📄
Jiao, L., Wang, D., Bai, Y., Chen, P., & Liu, F. (2021). Deep Learning in Visual Tracking: A Review.IEEE Transactions on Neural Networks and Learning Systems, 1–20. doi:10.1109/TNNLS.2021.3136907 PMID:34968181
📄
Kähler, O., Prisacariu, V. A., Ren, C. Y., Sun, X., Torr, P., & Murray, D. (2015). Very high frame rate volumetric integration of depth images on mobile devices. IEEE Transactions on Visualization and Computer Graphics, 21(11), 1241–1250. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=26439825&dopt=Abstract doi:10.1109/TVCG.2015.2459891 PMID:26439825
📄
Kaiming, H., Georgia, G., Piotr, D., & Ross, G. S. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
📄
Kakuta, T., Vinh, L. B., Kawakami, R., Oishi, T., & Ikeuchi, K. (2008). Detection of moving objects and cast shadows using a spherical vision camera for outdoor mixed reality. In Proceedings of the 2008 ACM symposium on Virtual reality software and technology (pp. 219-222). doi:10.1145/1450579.1450626
📄
Kalkofen, D., Sandor, C., White, S., & Schmalstieg, D. (2011). Visualization techniques for augmented reality.In Handbook of augmented reality (pp. 65–98). Springer. doi:10.1007/978-1-4614-0064-6_3
📄
Kalogerakis, E., Averkiou, M., Maji, S., & Chaudhuri, S. (2017). 3D shape segmentation with projective convolutional networks. In proceedings of the IEEE conference on computer vision and pattern recognition(pp. 3779-3788).
📄
Kanbara, M., Okuma, T., Takemura, H., & Yokoya, N. (1999). Real-time composition ofstereo imagesfor video see-through augmented reality. In Proceedings IEEE International Conference on Multimedia Computing and Systems (Vol. 1, pp. 213-219). IEEE. doi:10.1109/MMCS.1999.779195
📄
Karrer, M., Schmuck, P., & Chli, M. (2018). CVI-SLAM—Collaborative visual-inertial SLAM. IEEE Robotics and Automation Letters, 3(4), 2762–2769. doi:10.1109/LRA.2018.2837226
📄
Kaygusuz, N., Mendez, O., & Bowden, R. (2021, September). Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty Estimation. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) (pp. 2944-2949). IEEE. doi:10.1109/ITSC48978.2021.9565079
📄
Keivan, N., Patron-Perez, A., & Sibley, G. (2016). Asynchronous adaptive conditioning for visual-inertial SLAM.In Experimental Robotics (pp. 309–321). Springer. doi:10.1007/978-3-319-23778-7_21
📄
Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., & Kolb, A. (2013). Real-time 3d reconstruction in dynamic scenes using point-based fusion. In 2013 International Conference on 3D Vision-3DV 2013 (pp.1-8). IEEE.
📄
Kim, H., Yang, S. J., & Sohn, K. (2003). 3d reconstruction of stereo images for interaction between real and virtual worlds. In The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings. (pp. 169-176). IEEE.
📄
Kim, T. H., Jung, H., Lee, K. M., & Lee, S. U. (2008). Segment-based foreground object disparity estimation using Zcam and multiple-view stereo. In 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing (pp. 1251-1254). IEEE. doi:10.1109/IIH-MSP.2008.343
📄
Kirillov, A., He, K., Girshick, R., Rother, C., & Dollár, P. (2019). Panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9404-9413).
📄
Kiss-Illés, D., Barrado, C., & Salamí, E. (2019). GPS-SLAM: An augmentation of the ORB-SLAM algorithm.Sensors (Basel), 19(22), 4973. doi:10.3390/s19224973 PMID:31731624
📄
Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality (pp. 225-234). IEEE. doi:10.1109/ISMAR.2007.4538852
📄
Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality (pp. 225-234). IEEE. doi:10.1109/ISMAR.2007.4538852
📄
Klette, R., Koschan, A., & Schluns, K. (1998). Three-dimensional data from images. Springer-Verlag Singapore Pte. Ltd.
📄
Koh, Y. S., Goh, K. W., Dares, M., Yeong, C. F., Ming, E. S. L., Sunar, M. S., & Tey, Y. S. (2020). A review on augmented reality tracking methods for maintenance of robots. Jurnal Teknologi, 83(1), 37–43. doi:10.11113/jurnalteknologi.v83.14907
📄
Konda, K. R., & Memisevic, R. (2015). Learning visual odometry with a convolutional network. In VISAPP (1), (pp. 486-490). doi:10.5220/0005299304860490
📄
Krähenbühl, P., & Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials.Advances in Neural Information Processing Systems, 24, 109–117.
📄
Krähenbühl, P., & Koltun, V. (2013). Parameter learning and convergent inference for dense random fields. In International Conference on Machine Learning (pp. 513-521). PMLR.
📄
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
📄
Lee, L. H., Braud, T., Hosio, S., & Hui, P. (2021). Towards Augmented Reality Driven Human-City Interaction:Current Research on Mobile Headsets and Future Challenges. [CSUR]. ACM Computing Surveys, 54(8), 1–38.
📄
Li, P., Wang, D., Wang, L., & Lu, H. (2018). Deep visual tracking: Review and experimental comparison. Pattern Recognition, 76, 323–338. doi:10.1016/j.patcog.2017.11.007
📄
Li, X., Yi, W., Chi, H. L., Wang, X., & Chan, A. P. (2018). A critical review of virtual and augmented reality (VR/AR) applications in construction safety. Automation in Construction, 86, 150–162. doi:10.1016/j.autcon.2017.11.003
📄
Li, X., Zhang, L., & Zhu, Z. (2022). SnapshotNet: Self-supervised feature learning for point cloud data segmentation using minimal labeled data.Computer Vision and Image Understanding, 216, 103339. doi:10.1016/j.cviu.2021.103339
📄
Li, Y., Shi,J., & Lin, D. (2018). Low-latency video semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5997-6005).
📄
Li, Y., Sun, J., Tang, C. K., & Shum, H. Y. (2004). Lazy snapping. ACM Transactions on Graphics, 23(3),303–308. doi:10.1145/1015706.1015719
📄
Lim, L. A., & Keles, H. Y. (2020). Learning multi-scale features for foreground segmentation. Pattern Analysis & Applications, 23(3), 1369–1380. doi:10.1007/s10044-019-00845-9
📄
Lin, D., Ji, Y., Lischinski, D., Cohen-Or, D., & Huang, H. (2018). Multi-scale context intertwining for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 603-619).
📄
Ling, H.(2017). Augmented reality in reality.IEEE MultiMedia, 24(3), 10–15. doi:10.1109/MMUL.2017.3051517
📄
Liu, R., Yang, J., Chen, Y., & Zhao, W. (2019, June). eslam: An energy-efficient accelerator for real-time orb-slam on fpga platform. In Proceedings of the 56th Annual Design Automation Conference 2019 (pp. 1-6). doi:10.1145/3316781.3317820
📄
Liu, Z., Suo, C., Zhou, S., Xu, F., Wei, H., Chen, W., & Liu, Y. H. et al. (2019, November). Seqlpd: Sequence matching enhanced loop-closure detection based on large-scale point cloud description forself-driving vehicles.In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1218-1223). IEEE.doi:10.1109/IROS40897.2019.8967875
📄
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
📄
Masood, T., & Egger, J. (2019). Augmented reality in support of Industry 4.0—Implementation challenges and successfactors. Robotics and Computer-integrated Manufacturing, 58, 181–195. doi:10.1016/j.rcim.2019.02.003
📄
McCormac, J., Handa, A., Davison, A., & Leutenegger, S. (2017). Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In 2017 IEEE International Conference on Robotics and automation (ICRA) (pp. 4628-4635). IEEE. doi:10.1109/ICRA.2017.7989538
📄
Mei, C., Sibley, G., Cummins, M., Newman, P. M., & Reid, I. (2009). A Constant-Time Efficient Stereo SLAM System. In BMVC (pp. 1-11). doi:10.5244/C.23.54
📄
Memon, A. R., Wang, H., & Hussain, A. (2020). Loop closure detection using supervised and unsupervised deep neural networksfor monocular SLAM systems. Robotics and Autonomous Systems, 126, 103470. doi:10.1016/j.robot.2020.103470
📄
Merrill, N., & Huang, G. (2019, November). CALC2. 0: Combining appearance, semantic and geometric information for robust and efficient visual loop closure. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4554-4561). IEEE. doi:10.1109/IROS40897.2019.8968159
📄
Mur-Artal, R., & Tardós, J. D. (2014). ORB-SLAM: tracking and mapping recognizable features. In Workshop on Multi View Geometry in Robotics (MVIGRO)-RSS (Vol. 2014, p. 2).
📄
Mur-Artal, R., & Tardós, J. D. (2017). Orb-slam2: An open-source slam system for monocular,stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33(5), 1255–1262. doi:10.1109/TRO.2017.2705103
📄
Newcombe, R. A., Fox, D., & Seitz, S. M. (2015). Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp.343-352). doi:10.1109/CVPR.2015.7298631
📄
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., . . . Fitzgibbon, A. (2011).Kinectfusion: Real-time dense surface mapping and tracking. In International symposium on mixed and augmented reality, (pp. 127-136). IEEE.
📄
Newcombe, R. A., Lovegrove, S. J., & Davison, A. J. (2011). DTAM: Dense tracking and mapping in real-time. In 2011 international conference on computer vision. IEEE.
📄
Nistér, D. (2004). An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 756–770. doi:10.1109/TPAMI.2004.17 PMID:18579936
📄
Nistér, D., & Stewénius, H. (2007). A minimal solution to the generalised 3-point pose problem. Journal of Mathematical Imaging and Vision, 27(1), 67–79. doi:10.1007/s10851-006-0450-y OberwegerM.WohlhartP.LepetitV. (2015). Hands deep in deep learning for hand pose estimation.
📄
Okutomi, M., & Kanade, T. (1993). A multiple-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4), 353–363. doi:10.1109/34.206955
📄
Ondrúška, P., Kohli, P., & Izadi, S. (2015). Mobilefusion: Real-time volumetric surface reconstruction and dense tracking on mobile phones. IEEE Transactions on Visualization and Computer Graphics, 21(11), 1251–1258.doi:10.1109/TVCG.2015.2459902 PMID:26439826
📄
Outahar, M., Moreau, G., & Normand, J. M. (2021). Direct and Indirect vSLAM Fusion for Augmented Reality.Journal of Imaging, 7(8), 141. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=34460777&dopt=Abstract doi:10.3390/jimaging7080141 PMID:34460777
📄
Palmarini, R., Erkoyuncu, J. A., Roy, R., & Torabmostaedi, H. (2018). A systematic review of augmented reality applications in maintenance. Robotics and Computer-integrated Manufacturing, 49, 215–228. doi:10.1016/j.rcim.2017.06.002
📄
PaszkeA.ChaurasiaA.KimS.CulurcielloE. (2016). Enet: A deep neural network architecture for real-time semantic segmentation.
📄
Paul, M., Mayer, C., Gool, L., & Timofte, R. (2020) Efficient video semantic segmentation with labels propagation and refinement. In Winter Conference on Applications of Computer Vision (WACV) (pp. 2873-2882). IEEE.doi:10.1109/WACV45572.2020.9093520
📄
Perron, J. M., Huang, R., Thomas, J., Zhang, L., Tan, P., & Vaughan, R. T. (2015). Orbiting a moving target with multi-robot collaborative visual slam. In Workshop on Multi-View Geometry in Robotics (MVIGRO), (pp.1339-1344).
📄
PinheiroP. O.CollobertR.DollárP. (2015). Learning to segment object candidates.
📄
Qi, C. R., Liu, W., Wu, C., Su, H., & Guibas, L. J. (2018). Frustum point nets for 3d object detection from rgb-d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 918-927).
📄
Qin, X., Wang, B., Boegner, D., Gaitan, B., Zheng, Y., Du, X., & Chen, Y. (2021). Indoor localization of handheld OCT probe using visual odometry and real-time segmentation using deep learning. IEEE Transactions on Biomedical Engineering, 69(4), 1378–1385. doi:10.1109/TBME.2021.3116514 PMID:34587002
📄
Qiu, K., Ai, Y., Tian, B., Wang, B., & Cao, D. (2018). Siamese-ResNet: implementing loop closure detection based on siamese network. In 2018 IEEE Intelligent Vehicles Symposium (IV), (pp. 716-721). IEEE. doi:10.1109/IVS.2018.8500465
📄
Rabbi, I., & Ullah, S. (2013). A survey on augmented reality challenges and tracking. Acta graphica: znanstveni časopis za tiskarstvo i grafičke komunikacije, 24(1-2), 29-46.
📄
Rabbi, I., Ullah, S., & Khan, S. U. (2012). Augmented reality tracking techniques—A systematic literature.IOSR Journal of Computer Engineering, 2(2), 23–29. doi:10.9790/0661-0222329
📄
Raj, A., Maturana, D., & Scherer, S. (2015). Multi-scale convolutional architecture for semantic segmentation.Robotics Institute, Carnegie Mellon University, Tech. Rep. CMU-RITR-15-21.
📄
Riazuelo, L., Civera, J., & Montiel, J. M. (2014). C2tam: A cloud framework for cooperative tracking and mapping. Robotics and Autonomous Systems, 62(4), 401–413. doi:10.1016/j.robot.2013.11.007
📄
Rother, C., Kolmogorov, V., & Blake, A. (2004). ” GrabCut” interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG), 23(3), 309-314.
📄
Roxas, M., Hori, T., Fukiage, T., Okamoto, Y., & Oishi, T. (2018). Occlusion handling using semantic segmentation and visibility-based rendering for mixed reality. In Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology, (pp. 1-8). doi:10.1145/3281505.3281546
📄
Roy, A., & Todorovic, S. (2016). A multi-scale cnn for affordance segmentation in rgb images. In European conference on computer vision (pp. 186-201). Springer, Cham. doi:10.1007/978-3-319-46493-0_12
📄
Rudin, L. I., Osher, S., & Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D. Nonlinear Phenomena, 60(1-4), 259–268. doi:10.1016/0167-2789(92)90242-F
📄
Rünz, M., & Agapito, L. (2017). Co-fusion: Real-time segmentation, tracking and fusion of multiple objects.In International Conference on Robotics and Automation (ICRA), (pp. 4471-4478). IEEE. doi:10.1109/ICRA.2017.7989518
📄
Runz, M., Buffier, M., & Agapito, L. (2018). Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 10-20).IEEE. doi:10.1109/ISMAR.2018.00024
📄
Salas-Moreno, R. F., Glocken, B., Kelly, P. H., & Davison, A. J. (2014). Dense planar SLAM. In 2014 IEEE international symposium on mixed and augmented reality (ISMAR). IEEE.
📄
Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H., & Davison, A.J. (2013). Slam++: Simultaneous localisation and mapping at the level of objects. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1352-1359). doi:10.1109/CVPR.2013.178
📄
Schmuck, P., & Chli, M. (2019). CCM‐SLAM: Robust and efficient centralized collaborative monocular simultaneouslocalization and mapping for robotic teams. Journal of Field Robotics, 36(4), 763–781. doi:10.1002/rob.21854
📄
Schmuck, P., Ziegler, T., Karrer, M., Perraudin, J., & Chli, M. (2021). COVINS: Visual-Inertial SLAM for Centralized Collaboration. In 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) (pp. 171-176). IEEE. doi:10.1109/ISMAR-Adjunct54149.2021.00043
📄
Schöps, T., Engel, J., & Cremers, D. (2014). Semi-dense visual odometry for AR on a smartphone. In 2014 IEEE international symposium on mixed and augmented reality (ISMAR). IEEE.
📄
Shafi, M., Molisch, A. F., Smith, P. J., Haustein, T., Zhu, P., De Silva, P., Tufvesson, F., Benjebbour, A., &Wunder, G. (2017). 5G: A tutorial overview of standards, trials, challenges, deployment, and practice. IEEE Journal on Selected Areas in Communications, 35(6), 1201–1221. doi:10.1109/JSAC.2017.2692307
📄
Shelhamer, E., Rakelly, K., Hoffman, J., & Darrell, T. (2016). Clockwork convnets for video semantic segmentation. In European Conference on Computer Vision, (pp. 852-868). Springer, Cham.
📄
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1), 2–23. doi:10.1007/s11263-007-0109-1 SimonyanK.ZissermanA. (2014). Very deep convolutional networks for large-scale image recognition.
📄
SM, & Augasta, G. M. (2021). Review of recent advances in visual tracking techniques. Multimedia Tools and Applications, 80(16), 24185–24203. doi:10.1007/s11042-021-10848-6
📄
Spittle, B., Frutos-Pascual, M., Creed, C., & Williams, I. (2022). A Review of Interaction Techniques for Immersive Environments. IEEE Transactions on Visualization and Computer Graphics, 1. https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=35552136&dopt=Abstract doi:10.1109/TVCG.2022.3174805 PMID:35552136
📄
Strasdat, H., Montiel, J. M., & Davison, A. J. (2012). Visual SLAM: Why filter? Image and Vision Computing,30(2), 65–77. doi:10.1016/j.imavis.2012.02.009
📄
Stühmer, J., Gumhold, S., & Cremers, D. (2010). Real-time dense geometry from a handheld camera. In Joint Pattern Recognition Symposium, (pp. 11-20). Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-15986-2_2
📄
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
📄
Taketomi, T., Uchiyama, H., & Ikeda, S. (2017). Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Transactions on Computer Vision and Applications, 9(1), 1–11. doi:10.1186/s41074-017-0027-2
📄
Tang, X., Hu, X., Fu, C. W., & Cohen-Or, D. (2020). GrabAR: Occlusion-aware Grabbing Virtual Objects in AR. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, (pp.697-708). doi:10.1145/3379337.3415835
📄
Tateno, K., Tombari, F., Laina, I., & Navab, N. (2017). Cnn-slam: Real-time dense monocularslam with learned depth prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 6243-6252). doi:10.1109/CVPR.2017.695
📄
Tateno, K., Tombari, F., & Navab, N. (2016). When 2.5 D is not enough: Simultaneous reconstruction, segmentation and recognition on dense SLAM. In 2016 IEEE international conference on robotics and automation (ICRA). IEEE.
📄
Tian, Z., Shen, C., Wang, X., & Chen, H. (2021). Boxinst: High-performance instance segmentation with box annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp.5443-5452). doi:10.1109/CVPR46437.2021.00540
📄
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2016). Deep end2end voxel2voxel prediction.In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, (pp. 17-24).
📄
Triputen, S., Gopal, A., Weber, T., Höfert, C., Rätsch, M., & Schreve, K. (2018, March). Methodology to analyze the accuracy of 3D objects reconstructed with collaborative robot based monocular LSD-SLAM. In 2018 International Conference on Intelligent Autonomous Systems (Icoias), (pp. 185-190). IEEE. doi:10.1109/ICoIAS.2018.8494109
📄
Uhrig,J.,Rehder, E., Fröhlich, B., Franke, U., & Brox, T.(2018,June).Box2pix: Single-shot instance segmentation by assigning pixels to object boxes. In 2018 IEEE Intelligent Vehicles Symposium (IV) (pp. 292-299). IEEE.doi:10.1109/IVS.2018.8500621
📄
Van Krevelen, D. W. F., & Poelman, R. (2010). A survey of augmented reality technologies, applications and limitations. The International Journal of Virtual Reality: a Multimedia Publication for Professionals, 9(2),1–20. doi:10.20870/IJVR.2010.9.2.2767
📄
Van Opdenbosch, D., & Steinbach, E. (2018). Collaborative visual slam using compressed feature exchange.IEEE Robotics and Automation Letters, 4(1), 57–64. doi:10.1109/LRA.2018.2878920
📄
WangH.WangW.LiuJ. (2021) Temporal memory attention for video semantic segmentation. 10.1109/ICIP42928.2021.9506731
📄
Wang, K., Ma, S., Chen, J., Ren, F., & Lu, J. (2020). Approaches challenges and applications for deep visual odometry toward to complicated and emerging areas.IEEE Transactions on Cognitive and Developmental Systems.
📄
Wang, S., Clark, R., Wen, H., & Trigoni, N. (2018). End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. The International Journal of Robotics Research, 37(4-5), 513–542.doi:10.1177/0278364917734298
📄
Wang, X., Zhang, R., Kong, T., Li, L., & Shen, C. (2020). Solov2: Dynamic and fast instance segmentation.Advances in Neural Information Processing Systems, 33, 17721–17732.
📄
Wang, Y., Wang, P., Luo, Z., & Yan, Y. (2022). A novel AR remote collaborative platform for sharing 2.5 D gestures and gaze. International Journal of Advanced Manufacturing Technology, 1–9. PMID:35095164
📄
Westphal, C. (2017). Challenges in networking to support augmented reality and virtual reality. IEEE ICNC.
📄
Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., & Davison, A. (2015). ElasticFusion: Dense SLAM without a pose graph. Robotics Science and Systems: Online Proceedings. doi:10.15607/RSS.2015.XI.001
📄
Williams, B., Klein, G., & Reid, I. (2007). Real-time SLAM relocalisation. In international conference on computer vision, (pp. 1-8). IEEE. doi:10.1109/ICCV.2007.4409115
📄
Wu, B., Zhou, X., Zhao, S., Yue, X., & Keutzer, K. (2019). Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In 2019 International Conference on Robotics and Automation (ICRA), (pp. 4376-4382). IEEE. doi:10.1109/ICRA.2019.8793495
📄
Xu, J., Cao, H., Yang, Z., Shangguan, L., Zhang, J., He, X., & Liu, Y. (2022). {SwarmMap}: Scaling Up Real-time Collaborative Visual {SLAM} at the Edge. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22) (pp. 977-993).
📄
Yao, E., Zhang, H., Xu, H., Song, H., & Zhang, G. (2018). Robust RGB-D visual odometry based on edges and points. Robotics and Autonomous Systems, 107, 209–220. doi:10.1016/j.robot.2018.06.009
📄
YuF.KoltunV. (2015). Multi-scale context aggregation by dilated convolutions.
📄
ZagoruykoS.LererA.LinT. Y.PinheiroP. O.GrossS.ChintalaS.DollárP. (2016). A multipath network for object detection. 10.5244/C.30.15
📄
Zeiler, M. D., & Fergus, R.(2014). Visualizing and understanding convolutional networks. In European conference on computer vision, (pp. 818-833). Springer, Cham.
📄
Zeiler, M. D., Taylor, G. W., & Fergus, R. (2011). Adaptive deconvolutional networks for mid and high level feature learning. In International Conference on Computer Vision, (pp. 2018-2025). IEEE. doi:10.1109/ICCV.2011.6126474
📄
Zhang, H., Chen, X., Lu, H., & Xiao,J. (2018). Distributed and collaborative monocularsimultaneouslocalization and mapping for multi-robot systems in large-scale environments. International Journal of Advanced Robotic Systems, 15(3), 1729881418780178. doi:10.1177/1729881418780178
📄
Zhang, H., Jiang, K., Zhang, Y., Li, Q., Xia, C., & Chen, X. (2014). Discriminative feature learning for video semantic segmentation. In 2014 International Conference on Virtual Reality and Visualization (pp. 321-326).IEEE. doi:10.1109/ICVRV.2014.65
📄
Zhang, H., Wang, K., Tian, Y., Gou, C., & Wang, F. Y. (2018). MFR-CNN: Incorporating multi-scale features and global information for traffic object detection. IEEE Transactions on Vehicular Technology, 67(9), 8019–8030.doi:10.1109/TVT.2018.2843394
📄
Zhang, L., Wang, L., Zhang, X., Shen, P., Bennamoun, M., Zhu, G., Shah, S. A. A., & Song, J. (2018). Semantic scene completion with dense CRF from a single depth image. Neurocomputing, 318, 182–195. doi:10.1016/j.neucom.2018.08.052
📄
Zhang, S., Lu, S., He, R., & Bao, Z. (2021). Stereo Visual Odometry Pose Correction through Unsupervised Deep Learning. Sensors (Basel), 21(14), 4735. doi:10.3390/s21144735 PMID:34300475
📄
Zhang, T., Wei, S., & Ji, S. (2022). E2EC: An End-to-End Contour-based Method for High-Quality HighSpeed Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4443-4452).
📄
Zhang, Z. (2012). Microsoft kinect sensor and its effect. IEEE MultiMedia, 19(2), 4–10. doi:10.1109/MMUL.2012.24
📄
Zhang, Z., & Zhang, K. (2020, May). Farsee-net: Real-time semantic segmentation by efficient multi-scale context aggregation and feature space super-resolution. In 2020 IEEE International Conference on Robotics and Automation (ICRA) (pp. 8411-8417). IEEE. doi:10.1109/ICRA40945.2020.9196599
📄
Zhao, H., Qi, X., Shen, X., Shi, J., & Jia, J. (2018). Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European conference on computer vision (ECCV) (pp. 405-420). doi:10.1007/978-3-030-01219-9_25
📄
Zheng, S.,Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., & Torr, P. H. et al. (2015). Conditional random fields as recurrent neural networks. In Proceedings of the IEEE international conference on computer vision, (pp. 1529-1537). doi:10.1109/ICCV.2015.179
📄
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 633-641).
📄
Zhu, J., Wang, L., Yang, R., & Davis, J. (2008). Fusion of time-of-flight depth and stereo for high accuracy depth maps. In Conference on Computer Vision and Pattern Recognition, (pp. 1-8). IEEE.
📄
ZhuX. F.XuT.WuX. J. (2022). Visual Object Tracking on Multi-modal RGB-D Videos: A Review.
📄
Zollhöfer, M., Nießner, M., Izadi, S., Rehmann, C., Zach, C., Fisher, M., Wu, C., Fitzgibbon, A., Loop, C., Theobalt, C., & Stamminger, M. (2014). Real-time non-rigid reconstruction using an RGB-D camera. ACM Transactions on Graphics, 33(4), 1–12. doi:10.1145/2601097.2601165
📄
Zou, D., & Tan, P. (2012). Coslam: Collaborative visual slam in dynamic environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2), 354–366. doi:10.1109/TPAMI.2012.104 PMID:22547430
📄
Zou, D., Tan, P., & Yu, W. (2019). Collaborative visual SLAM for multiple agents: A briefsurvey. Virtual Reality & Intelligent Hardware, 1(5), 461–482. doi:10.1016/j.vrih.2019.09.002