Deepdive: a learning-based approach for virtual camera in immersive contents
1. Department of Computer Science, Oakland University, Michigan, USA
2. Digital Image Processing Laboratory, Islamia College, Peshawar, Pakistan
Abstract
Keywords: Virtual reality ; Immersive contents ; Deep learning ; Aesthetic ; Saliency
Content




Name of object | Precision (%) | Recall (%) | F1-sore (%) |
---|---|---|---|
Animal | 85.71 | 75.00 | 80.00 |
Person | 65.22 | 66.67 | 65.93 |
Cartoon | 66.67 | 68.09 | 67.37 |

Video No. | Video name | Focus point | Starting offset | FOV | Resolution | FPS |
---|---|---|---|---|---|---|
1 | 360° Degree Kitchen Home Tour | Persons | 0:01 | 2k | 1920×720 | 30 |
2 | Kitchen 360 test | Person | 0:01 | 2k | 1920×720 | 30 |
3 | 360° Camera England at Wembley, unlike you have seen before! | Persons | 0:05 | 2k | 1920×720 | 30 |
4 | Real Madrid vs. Juventus | 2017 Champions League Final | 360° VIdeo | FOX SOCCER | Persons | 0:10 | 2k | 1920×720 | 30 |
5 | Lions 360° National Geographic | Animal | 0:07 | 2k | 1920×720 | 30 |
6 | Clash of Clans 360°: Experience a Virtual Reality Raid | Cartoon | 0:04 | 2k | 1920×720 | 30 |
7 | 360° Underwater National Park National Geographic | Animal | 0:04 | 2k | 1920×720 | 30 |



Reference
Khan N, Muhammad K, Hussain T, Nasir M, Munsif M, Imran A S, Sajjad M. An adaptive game-based learning strategy for children road safety education and practice in virtual space. Sensors, 2021, 21(11): 3661 DOI:10.3390/s21113661
Muhammad K, Hussain T, Baik S W. Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recognition Letters, 2020, 130: 370–375 DOI:10.1016/j.patrec.2018.08.003
Mehmood I, Sajjad M, Baik S W. Video summarization based tele-endoscopy: a service to efficiently manage visual data generated during wireless capsule endoscopy procedure. Journal of Medical Systems, 2014, 38(9): 1–9 DOI:10.1007/s10916-014-0109-y
Muhammad K, Ahmad J, Sajjad M, Baik S W. Visual saliency models for summarization of diagnostic hysteroscopy videos in healthcare systems. SpringerPlus, 2016, 5(1): 1495 DOI:10.1186/s40064-016-3171-8
Haq I U, Muhammad K, Ullah A, Baik S W. DeepStar: detecting starring characters in movies. IEEE Access, 2019, 7: 9265–9272 DOI:10.1109/access.2018.2890560
Liu D, Hua G, Chen T. A hierarchical visual model for video object summarization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(12): 2178–2190 DOI:10.1109/tpami.2010.31
Khosla A, Hamid R, Lin C J, Sundaresan N. Large-scale video summarization using web-image priors. 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, 2698–2705 DOI:10.1109/cvpr.2013.348
Potapov D, Douze M, Harchaoui Z, Schmid C. Category-specific video summarization. In: Computer Vision–ECCV 2014. Cham, Springer International Publishing, 2014, 540–555
Sun M, Farhadi A, Seitz S. Ranking domain-specific highlights by analyzing edited videos. In: Computer Vision–ECCV 2014., Cham, Springer International Publishing, 2014, 787–802
Yao T, Mei T, Rui Y. Highlight detection with pairwise deep ranking for first-person video summarization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 982–990 DOI:10.1109/cvpr.2016.112
Zhao B, Xing E P. Quasi real-time summarization for consumer videos. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA, IEEE, 2014, 2513–2520 DOI:10.1109/cvpr.2014.322
Gong B Q, Chao W L, Grauman K, Sha F. Diverse sequential subset selection for supervised video summarization. Advances in Neural Information Processing Systems, 2014, 3: 2069–2077
Zhang K, Chao W L, Sha F, Grauman K. Summary transfer: exemplar-based subset selection for video summarization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, IEEE, 2016, 1059–1067 DOI:10.1109/cvpr.2016.120
Zhang K, Chao W-L, Sha F, Grauman K. Video summarization with long short-term memory. In: Computer Vision–ECCV 2016. Cham, Springer International Publishing, 2016, 766–782
Lee Y J, Ghosh J, Grauman K. Discovering important people and objects for egocentric video summarization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA, IEEE, 2012, 1346–1353 DOI:10.1109/cvpr.2012.6247820
Lu Z, Grauman K. Story-driven summarization for egocentric video. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA, IEEE, 2013, 2714–2721 DOI:10.1109/cvpr.2013.350
Perazzi F, Krähenbühl P, Pritch Y, Hornung A. Saliency filters: contrast based filtering for salient region detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA, IEEE, 2012, 733–740 DOI:10.1109/cvpr.2012.6247743
Wang J W, Borji A, Jay Kuo C C, Itti L. Learning a combined model of visual saliency for fixation prediction. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2016, 25(4): 1566–1579 DOI:10.1109/tip.2016.2522380
Su Y C, Jayaraman D, Grauman K. Pano2Vid: automatic cinematography for watching 360° videos. 2016
Lin Y C, Chang Y J, Hu H N, Cheng H T, Huang C W, Sun M. Tell me where to look: investigating ways for assisting focus in 360° video. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. Denver Colorado USA, New York, NY, USA, ACM, 2017, 2535–2545 DOI:10.1145/3025453.3025757
Ullah H, Muhammad K, Irfan M, Anwar S, Sajjad M, Imran A S, de Albuquerque V H C. Light-DehazeNet: a novel lightweight CNN architecture for single image dehazing. IEEE Transactions on Image Processing, 2021, 30: 8968–8982 DOI:10.1109/tip.2021.3116790
Ullah H, Irfan M, Han K, Lee J W. DLNR-SIQA: deep learning-based no-reference stitched image quality assessment. Sensors, 2020, 20(22): 6457 DOI:10.3390/s20226457
Sajjad M, Irfan M, Muhammad K, Ser J D, Sanchez-Medina J, Andreev S, Ding W P, Lee J W. An efficient and scalable simulation model for autonomous vehicles with economical hardware. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(3): 1718–1732 DOI:10.1109/tits.2020.2980855
Kim H G, Baddar W J, Lim H T, Jeong H, Ro Y M. Measurement of exceptional motion in VR video contents for VR sickness assessment using deep convolutional autoencoder. VRST '17: Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology. 2017, 1–7 DOI:10.1145/3139131.3139137
Cheng H T, Chao C H, Dong J D, Wen H K, Liu T L, Sun M. Cube padding for weakly-supervised saliency prediction in 360° videos. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018, 1420–1429 DOI:10.1109/cvpr.2018.00154
Su Y C, Grauman K. Making 360° video watchable in 2D: learning videography for click free viewing. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA, IEEE, 2017, 1368–1376 DOI:10.1109/cvpr.2017.150
Li G B, Yu Y Z. Visual saliency based on multiscale deep features. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, 5455–5463 DOI:10.1109/cvpr.2015.7299184
Yu Y L, Gu J, Mann G K I, Gosine R G. Development and evaluation of object-based visual attention for automatic perception of robots. IEEE Transactions on Automation Science and Engineering, 2013, 10(2): 365–379 DOI:10.1109/tase.2012.2214772
Bansal A, Ma S, Ramanan D, Sheikh Y. Recycle-GAN: Unsupervised Video Retargeting. 2018
Li B, Lin C W, Shi B X, Huang T J, Gao W, Kuo C C J. Depth-aware stereo video retargeting. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018, 6517–6525 DOI:10.1109/cvpr.2018.00682
Lei J, Luan Q, Song X H, Liu X, Tao D P, Song M L. Action parsing-driven video summarization based on reinforcement learning. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(7): 2126–2137 DOI:10.1109/tcsvt.2018.2860797
Sitzmann V, Serrano A, Pavel A, Agrawala M, Gutierrez D, Masia B, Wetzstein G. How do people explore virtual environments? 2016
Rai Y, Gutiérrez J, le Callet P. A dataset of head and eye movements for 360 degree images. MMSys'17: Proceedings of the 8th ACM on Multimedia Systems Conference. 2017, 205–210
Jiang H Z, Wang J D, Yuan Z J, Wu Y, Zheng N N, Li S P. Salient object detection: a discriminative regional feature integration approach. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA, IEEE, 2013, 2083–2090 DOI:10.1109/cvpr.2013.271
Tong N, Lu H C, Ruan X, Yang M H. Salient object detection via bootstrap learning. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA, IEEE, 2015, 1884–1892 DOI:10.1109/cvpr.2015.7298798
Li G B, Yu Y Z. Deep contrast learning for salient object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, IEEE, 2016, 478–487 DOI:10.1109/cvpr.2016.58
Wang L J, Lu H C, Wang Y F, Feng M Y, Wang D, Yin B C, Ruan X. Learning to detect salient objects with image-level supervision. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA, IEEE, 2017, 3796–3805 DOI:10.1109/cvpr.2017.404
Zhang X N, Wang T T, Qi J Q, Lu H C, Wang G. Progressive attention guided recurrent network for salient object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018, 714–722 DOI:10.1109/cvpr.2018.00081
Wang W G, Shen J B, Dong X P, Borji A, Yang R G. Inferring salient objects from human fixations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 1913–1927 DOI:10.1109/tpami.2019.2905607
Lin S S, Lin C H, Yeh I C, Chang S H, Yeh C K, Lee T Y. Content-aware video retargeting using object-preserving warping. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(10): 1677–1686 DOI:10.1109/tvcg.2013.75
Zhang J Y, Li S W, Kuo C C J. Compressed-domain video retargeting. IEEE Transactions on Image Processing, 2014, 23(2): 797–809 DOI:10.1109/tip.2013.2294541
Li B, Duan L Y, Wang J Q, Ji R R, Lin C W, Gao W. Spatiotemporal grid flow for video retargeting. IEEE Transactions on Image Processing, 2014, 23(4): 1615–1628 DOI:10.1109/tip.2014.2305843
Kim D, Woo S, Lee J Y, Kweon I S. Deep video inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, IEEE, 2019, 5785–5794 DOI:10.1109/cvpr.2019.00594
Khan S, Muhammad K, Mumtaz S, Baik S W, de Albuquerque V H C. Energy-efficient deep CNN for smoke detection in foggy IoT environment. IEEE Internet of Things Journal, 2019, 6(6): 9237–9245 DOI:10.1109/jiot.2019.2896120
Sajjad M, Khan S, Muhammad K, Wu W Q, Ullah A, Baik S W. Multi-grade brain tumor classification using deep CNN with extensive data augmentation. Journal of Computational Science, 2019, 30: 174–182 DOI:10.1016/j.jocs.2018.12.003
Hussain T, Muhammad K, Ser J D, Baik S W, de Albuquerque V H C. Intelligent embedded vision for summarization of multiview videos in IIoT. IEEE Transactions on Industrial Informatics, 2020, 16(4): 2592–2602 DOI:10.1109/tii.2019.2937905
Thomas S S, Gupta S, Subramanian V K. Perceptual video summarization—A new framework for video summarization. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(8): 1790–1802 DOI:10.1109/tcsvt.2016.2556558
Zhang Y, Zimmermann R. Efficient summarization from multiple georeferenced user-generated videos. IEEE Transactions on Multimedia, 2016, 18(3): 418–431 DOI:10.1109/tmm.2016.2520827
Drakopoulos P, Koulieris G A, Mania K. Eye tracking interaction on unmodified mobile VR headsets using the selfie camera. ACM Transactions on Applied Perception, 2021, 18(3): 1–20 DOI:10.1145/3456875
Hu H N, Lin Y C, Liu M Y, Cheng H T, Chang Y J, Sun M. Deep 360 pilot: learning a deep agent for piloting through 360° sports videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA, IEEE, 2017, 1396–1405 DOI:10.1109/cvpr.2017.153
Xu Y Y, Dong Y B, Wu J R, Sun Z Z, Shi Z R, Yu J Y, Gao S H. Gaze prediction in dynamic 360° immersive videos. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018, 5333–5342 DOI:10.1109/cvpr.2018.00559
Chen X W, Kasgari A T Z, Saad W. Deep learning for content-based personalized viewport prediction of 360-degree VR videos. IEEE Networking Letters, 2020, 2(2): 81–84 DOI:10.1109/lnet.2020.2977124
Li C, Xu M, Jiang L, Zhang S Y, Tao X M. Viewport proposal CNN for 360° video quality assessment. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, IEEE, 2019, 10169–10178 DOI:10.1109/cvpr.2019.01042
Hosu V, Goldlücke B, Saupe D. Effective aesthetics prediction with multi-level spatially pooled features. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, IEEE, 2019, 9367–9375 DOI:10.1109/cvpr.2019.00960