Real-time human segmentation by BowtieNet and a SLAM-based human AR system
1. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract
Keywords: Augmented reality ; Moving object ; Reconstruction and tracking ; Camera pose ; Human segmentation
Content
/2019.0025/alternativeImage/4ee4f4b8-c6ad-4076-8cd6-245c3f36b6b2-F001.png)
/2019.0025/alternativeImage/4ee4f4b8-c6ad-4076-8cd6-245c3f36b6b2-F002.png)
Segmentation processing steps for videos |
---|
Begin: |
(1) Segment human through the whole image region in the first frame. |
While the video has not ended: |
(2) Capture a new frame. |
(3) Expand the bounding box of the segmentation result in previous frame and use it as the segmentation ROI in the current frame. If there is no human in the previous frame, use the whole current frame as the segmentation ROI. |
(4) Use BowtieNet to segment the ROI. |
(5) Generate the segmentation result of the current frame according to the ROI’s location and the ROI’s segmentation result. |
End |
/2019.0025/alternativeImage/4ee4f4b8-c6ad-4076-8cd6-245c3f36b6b2-F003.png)
Methods | Mean IOU (%) | Speed (fps) | |
---|---|---|---|
Validation | Testing | ||
Pixel-by-Pixel[18] | 86.70 | 86.83 | 0.033 |
VGG-seg-net[19] | 83.57 | - | 1000 |
DCAN[20] | 90.89 | - | - |
SegNet[8] | 90.12 | 90.00 | 49.5 |
DeepLab v2-VGG[10] | 91.55 | 91.37 | 43.7 |
U-Net[9] | 90.85 | 90.87 | 36.4 |
DeepLab v2-ResNet[10] | 92.66 | 92.64 | 22.2 |
DeepLab v3+[13] | 92.83 | 92.53 | 34.5 |
BowtieNet (ours) | 93.64 | 93.42 | 39.1 |
/2019.0025/alternativeImage/4ee4f4b8-c6ad-4076-8cd6-245c3f36b6b2-F004.png)
/2019.0025/alternativeImage/4ee4f4b8-c6ad-4076-8cd6-245c3f36b6b2-F005.png)
/2019.0025/alternativeImage/4ee4f4b8-c6ad-4076-8cd6-245c3f36b6b2-F006.png)
/2019.0025/alternativeImage/4ee4f4b8-c6ad-4076-8cd6-245c3f36b6b2-F007.png)
/2019.0025/alternativeImage/4ee4f4b8-c6ad-4076-8cd6-245c3f36b6b2-F008.png)
Reference
Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. In: Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. IEEE Computer Society, 2007, 1–10 DOI:10.1109/ismar.2007.4538852
Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 2015, 31(5): 1147–1163 DOI:10.1109/tro.2015.2463671
Mur-Artal R, Tardos J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics, 2017, 33(5): 1255–1262 DOI:10.1109/tro.2017.2705103
Park Y, Lepetit V, Woo W. Texture-less object tracking with online training using an RGB-D camera. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality. New York, USA, IEEE, 2011 DOI:10.1109/ismar.2011.6162879
Ren C Y, Prisacariu V, Murray D, Reid I. STAR3D: simultaneous tracking and reconstruction of 3D objects using RGB-D data. In: 2013 IEEE International Conference on Computer Vision. Sydney, Australia. New York, USA, IEEE, 2013 DOI:10.1109/iccv.2013.197
Feng Y J, Wu Y H, Fan L X. On-line object reconstruction and tracking for 3D interaction. In: 2012 IEEE International Conference on Multimedia and Expo. Melbourne, Australia, IEEE, 2012, 711–716 DOI:10.1109/icme.2012.144
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651 DOI:10.1109/tpami.2016.2572683
Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495 DOI:10.1109/tpami.2016.2644615
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015, 234–241 DOI:10.1007/978-3-319-24574-4_28
Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848 DOI:10.1109/tpami.2017.2699184
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA, IEEE, 2014, 580–587 DOI:10.1109/cvpr.2014.81
Mostajabi M, Yadollahpour P, Shakhnarovich G. Feedforward semantic segmentation with zoom-out features. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA, IEEE, 2015, 3376–3385 DOI:10.1109/cvpr.2015.7298959
Chen L C, Zhu Y K, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation//Computer Vision–ECCV 2018. Cham: Springer International Publishing, 2018, 833–851 DOI:10.1007/978-3-030-01234-2_49
Chen L C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. 2017, arXiv preprint arXiv:1706.05587
Jianbo S, Tomasi C. Good features to track. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Seattle, WA, 1994, 593–600 DOI:10.1109/cvpr.1994.323794
Hartley R, Zisserman A. Multiple view geometry in computer vision. Cambridge university press, 2003
Huber P J. Robust statistics. Springer Berlin Heidelberg, 2011
Wu Z, Huang Y, Yu Y, Wang L, Tan T. Early hierarchical contexts learned by convolutional networks for image segmentation. In: Proceedings of the 22nd International Conference on Pattern Recognition. Stockholm, Sweden, 2014, 1538–1543 DOI:10.1109/icpr.2014.273
Song C F, Huang Y Z, Wang Z Y, Wang L. 1000fps human segmentation with deep convolutional neural networks. In: Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition. Kuala Lumpur, Malaysia, 2015, 474–478 DOI:10.1109/acpr.2015.7486548
Tesema F B, Wu H, Zhu W. Human segmentation with deep contour-aware network. In: Proceedings of the 2018 International Conference on Computing and Artificial Intelligence. Medan, Indonesia, 2018, 98–103 DOI:10.1145/3194452.3194471
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature fmbedding. In: Proceedings of the 22nd ACM international conference on Multimedia. Orlando, Florida, USA, ACM, 2014, 675–678 DOI:10.1145/2647868.2654889
Milletari F, Navab N, Ahmadi S A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV). Stanford, CA, 2016, 565–571 DOI:10.1109/3DV.2016.79