# A review on image-based rendering

1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China

2. Beijing Engineering Research Center for Virtual Simulation and Visualization, Beijing 100871, China

Abstract

Keywords： Image-based rendering ; Virtual reality ; Image interpolation ; Panorama

Content

^{[1,2]}. The rendering part including geometric cropping, illumination calculation and element simplification of the reconstructed scene according to the user s viewpoint, is to generate the image observed from the current viewpoint. This traditional process based on 3D modeling and rendering can fully express the geometric information of the scene, so that users can flexibly change the external conditions such as illumination to obtain different rendering effects, and can easily roam the scene freely. However, the complexity of this modeling and rendering process is very high. The time required for modeling and rendering will increase significantly, when the scale of the scene is larger and the surface details of the objects in the scene are richer. In addition, the rendering results obtained by this traditional 3D modeling and rendering process are usually not photo-realistic without high computational cost.

^{[3]}. However, users can not look at the scene from arbitrary viewpoints because the image seen from uncaptured viewpoints can not be rendered, which is much easier using traditional 3D modeling and rendering process. Therefore, how to use finite and discrete captured image to obtain continuous and arbitrary viewpoints becomes an important issue. This is called image-based rendering. More specifically, image-based rendering technique is a technique of generating a rendering result of an unknown viewpoint by interpolating through discrete input images or re-projecting pixels in input images. This rendering method uses images as the fundamental elements, which requires no more complicated geometric modeling processes and rendering processes but only some analysis and processing of images. The complexity does not increase with the complexity of object relationships and surface details in the scene, which can significantly save computing resources. Moreover, since the results are directly rendered from photos, it can be photorealistic.

^{[4]}. Likewise, we classify previous techniques into two categories, rendering with or without geometry, to introduce some classic approaches and some new algorithms. We also analyze the limitations and key issues of these approaches.

**Plenoptic function**

^{[5,6]}，defined with observing location $({V}_{x},{V}_{y},{V}_{z})$ , angle of incidence $(\theta ,\varphi )$ , the wavelength $\lambda $ and the time $t$ . When considering static environment with fixed lighting condition, we can drop out two variables, time and wavelength. Then we get a 5-dimensional plenoptic function

^{[7]}：

^{[8,9]}(Figure 1):

**Cylindrical panoramic image mosaic**

^{[10,11]}. Here, taking the cylindrical panorama as an example, we introduce image-based rendering approaches using panoramas.

^{[12]}，and they can be obtained by image stitching as well. As we mentioned above, a plenoptic function describes a panorama when the viewpoint is fixed. Therefore, we can get a complete plenoptic function by constructing a panorama by image stitching. If the camera’s intrinsic matrix is fixed, we can project each regular images to a cylindrical surface to obtain the cylindrical panorama (Figure 2). Shown in Figure 2, given the coordinate $(x,y)$ in a regular image, we can directly obtain the corresponding $(u,v)$ parameters in the cylindrical environment map. In practice, there may be distortion because of the accumulated registration errors. To solve this problem, people combine both global and local alignment, which significantly improves the quality of the image mosaics

^{[13]}, to obtain a better panorama (Figure 3).

^{[3]}.

**Image mosaic through camera rotation**

^{[14]}, where the camera motion is constrained along concentric circles on a plane. Concentric mosaics can be created by compositing slit images taken at different locations of each circle. Here, a slit image is a single vertical-line image, which can be simulated with a regular camera by keeping only the central column of pixels of each frame. There is a possible concentric mosaic capture system setup, where a number of cameras are mounted on a rotating horizontal beam that is supported by a tripod. Each camera is constrained to move continuously and uniformly along its circle. We take the central column of pixels of image captured by each camera as the slit image, and we put them together to obtain the concentric mosaic. By moving the camera further away from the rotation axis, we obtain a collection of concentric mosaics (Figure 4). For each circle, we would like to capture both viewing directions at each tangent line. This can be done, for example, with two cameras facing the opposite directions.

^{[15]}. It records outward-looking spherical light fields using fisheye camera on a programmable pan/tilt head. The system shoots 72 images horizontally by 24 images vertically, spaced 5° centered around the equator, resulting in a set of 1728 images (Figure 5). Since the fisheye covers a hemisphere of 180°, this system provides the data necessary to render views of the scene from any position within the sphere outlined by the cameras. When rendering using HMD, the images are resized to the resolution appropriate for device first, and then render two views for both eyes individually.

^{[16,17,18]}, dense/sparse point cloud reconstructed from multiple images

^{[19,20]}, depth maps or depth distributions

^{[21,22,23]}, or even only correspondences

^{[24,25,26]}.

^{[20,25,26]}, and some algorithms deal with inconsistencies in the three-dimensional information inside one image and between different images

^{[20,23]}.

^{[27,28,29]}. Next, some typical image-based rendering methods based on various key problems and using different three-dimensional information forms will be introduced and analyzed.

**View interpolation based on image correspondence**

^{[24]}. This method can get better results for the case where the two input images are relatively close, because when the viewpoints are relatively close, the visibility of the objects in the scene has less influence on the result of interpolating the new viewpoint images. However, when the difference between the input images is large, the overlapping area of the two images is also reduced, and the result of interpolation by this method is deteriorated. Figure 7 shows that this method performs well for close images.

^{[30,31]}. Moreover, even for image pairs that are difficult to directly find a complete dense correspondences, the interpolation problem between images with large difference can be handled by complementing the incomplete correspondences. Nie et al. proposed a method dealing with the problem that the correspondences between the image pairs with large differences is not complete enough to interpolate the new viewpoints

^{[26]}. The approach first over segments the image to super-pixels

^{[32]}, ensuring that each super-pixel represents the same plane. Then it uses correspondence estimation method to calculate a homography for each super-pixel to characterize the motion relationship between two images of the super-pixel. Due to the incompleteness of the correspondences caused by the large difference of viewpoints, only a part of the super-pixel s homography can be fitted by the RANSAC algorithm through enough correspondences in the super-pixel

^{[33]}. For super-pixels without sufficient matches or with poor matching consistency, homographys are obtained by propagation of other super-pixels similar and nearby.

**Rendering methods with incomplete depth map**

^{[34]}, while some methods obtain a dense depth map by performing a piecewise plane fitting method on the point cloud

^{[35,36,37]}. Some of these depth synthesis methods are well suited for image-based rendering problems.

^{[19]}, in which a depth synthesis algorithm is proposed to provide depth sampling for areas with poor depth recovery in the image. A rendering algorithm based on local shape-preserving warping is also proposed to deal with the inconsistency caused by depth inaccuracy. The problem is to get a better visual result.

^{[1]}, and the 3D point cloud can be reconstructed

^{[2]}. After the reconstruction result is obtained, the point cloud is re-projected back to each input image to obtain a depth value sample for each image. Since it is not guaranteed that each region in the image can be corresponding to a part of the 3D point cloud, only part of the pixels will be re-projected by points in the point cloud. So only some pixels of each image have depth values (Figure 8).

^{[32]}. Then using the hypothesis that pixels in the same super-pixel have depth consistency, it constructs a weighted undirected graph with super-pixels as nodes, and uses super-pixels containing sufficient depth value samples to synthesize the depth of super-pixels without sufficient samples. Therefore, a complete depth map is obtained. In addition, in order to deal with the rendering error caused by the error induced by the point cloud noise, the algorithm also performs a shape-preserving warping for each super-pixel. It finds the bounding box along the coordinate axis for each super-pixel, and constructs a triangle mesh in the bounding box. Then it uses the constraint of the mesh to warping the super-pixel, so as to obtain a better visual rendering result.

**Rendering based on geometry consistency**

**Rendering using dense correspondence and depth map**

^{[20]}. They apply image morphing in three-dimensional space, and compensate for inaccuracies and invalid assumptions made in reconstruction by aligning image regions according to bidirectional correspondence maps.

^{[1,2]}. For every pixel $x$ in a reference image $\mathrm{A}$ , we query the 3D scene structure if a point exists that projects to it. If such a point exists, we can derive a correspondence vector estimate ${\tilde{w}}_{AB}(x)$ for any neighboring view $\mathrm{B}$ . We call ${\tilde{w}}_{AB}(x)$ geometry-guided correspondence. The geometric information can be exploited to constraint the correspondence search. The correspondence search is based upon an optimization scheme that minimizes an energy term. Different from traditional optimization scheme

^{[38,39]}, they add an additional geometric constraint that enforces consistency between the geometric information and the estimated correspondences. After obtaining the dense correspondence, they triangulate an actual world space position if they know at least one valid corresponding pixel in another image by intersecting both viewing rays. Correct depth value cannot be obtained for invalid pixels, so they segment each image into super-pixels and fit a plane for each super-pixel to help obtaining a robust result. After depth map is calculated, the geometric-guided correspondence can be updated, and the dense correspondence can be optimized again.

**Soft 3D reconstruction for image-based rendering**

^{[23]}, which considers the inconsistency as well, that is, performing soft reconstruction of the scene. Unlike depth map which gives a fixed depth value to each pixel, or point cloud which reconstructs a fixed position by multi-view geometry method, soft 3D reconstruction constructs a distribution of depth values for each pixel. Based on this expression of the three-dimensional scene, a similar form of visibility function can be obtained, which is used to assist the synthesis of novel viewpoint images, and can also be used for iterative optimization of depth maps.

^{[40]}. Then for each input viewpoint, it constructs a number of layers of depth planes parallel to the image plane to represent different depth values, and obtains a three-dimensional structure in space to express the depth distribution of this image. Each pixel $(x,y)$ records a vote-value at the depth plane $z$ as the possibility of existence of an object at the spatial position of depth

*z*observed from $(x,y)$ , while recording another value called vote-confidence to represent the credibility of vote-value. The initial vote-value and vote-confidence can be obtained simply using the initial depth map. Then the depth values of the adjacent viewpoint is used to calculate a consistent depth distribution $Consensus(x,y,z)$ considering the adjacent viewpoint information for each viewpoint. After obtaining consistent depth distribution, we can calculate the visibility information of each point.

^{[41,42]}, and the uncertainty across different views is incorporated using the soft reconstruction structure. Through this soft 3D reconstruction, a fixed depth value is not directly assigned to a certain pixel. Instead, the depth information of adjacent pixels and adjacent viewpoints is combined, and the inconsistency of depth information is well considered, so that good rendering results are obtained. In addition, the iterative optimization method of visibility and depth greatly improves the accuracy of depth value calculation at the boundary of objects in the image. Figure 10 shows the difference between the rendering result with a certain single depth value and the rendering result with soft 3D reconstruction. It can be seen that this method can deal with the problem caused by inaccurate depth values very well.

**Rendering for casual photography**

^{[43]}. This allows the data acquisition process to be independent of professional equipment and trained professionals, and even inexperienced average users can capture photos for 3D panoramic photo reconstructions with inexpensive cameras (Figure 11). Since the shooting is non-profes-sional, there may be geometric distortion of the object in the scene or slight movement of the object between the acquired images, resulting in inconsistency of the three-dimensional information.

^{[44]}. Then complete depth maps need to be calculated using multi-view stereo. Because the photos is captured while moving the hand-held camera on a sphere of about half arm s length radius, traditional methods for depth estimation cannot obtain depth map with high quality due to the inconsistency. So they proposed a novel prior called near envelope that constrains the depth using a conservatively but tightly estimated lower bound, which highly improves the reconstruction. The obtained depth images need to be warped into the panoramic domain. For different pixels projected to the same position, the front point is selected by depth test to form the front surface of the panorama. Since the occluded content is important too, the back surface of the panorama is obtained by inverting the depth test. Finally, the front and back surfaces are merged into a uniform two-layer representation, resulting in a lightweight representation of the scene, which makes users explore freely. In addition, the method estimates the normal of the scene to enable some lighting effects.

^{[45]}. They jointly estimate the camera poses as well as spatially-varying adjustment maps that are applied to deform the depth maps and bring them into good alignment. They remove the need for label smoothness optimization, and replace it with independently optimizing every pixel label after filtering the data term in a depth-guided edge-aware manner, achieving two orders of magnitude speedup. Finally, they convert the 3D panorama into a multi-layered 3D mesh that can be rendered with standard graphics engines.

**Rendering combining global and local geometry**

^{[46]}, combining the global and local information of the scene, to obtain a more realistic rendering effect (Figure 12).

^{[24]}, while the method of Nie et al. is proposed to deal with the wide baseline case

^{[26]}.

Geometric information |
Constraint of the input viewpoints |
Characteristic | |
---|---|---|---|

Cylindrical panorama | None | Strict | Low computational complexity, only for views from a single point |

Concentric mosaic | None | Strict | A specific region can be explored freely |

Spherical light field | None | Strict | A specific region can be explored freely |

Chen et al. 1993 | Implicitly | Free | Week constraint on input views |

Nie et al. 2017 | Implicitly | Free | Dealing with wide-baseline images |

Chaurasia et al. 2013 | Explicitly | Free | Tolerance of inaccurate 3D information |

Lipski et al. 2014 | Explicitly | Free | Tolerance of inconsistent 3D information |

Penner et al. 2017 | Explicitly | Free | Modeling the uncertainty of 3D information |

Hedman et al. 2017 | Explicitly | Free | Tolerance of rough input images with inconsistent 3D information |

Hedman et al. 2018 | Explicitly | Free | Tolenrance of rough input images; high computational efficiency |

Hedman et al. 2016 | Explicitly | Free | Modeling the uncertainty; suitable for indoor scenes |

^{[47]}. When rendering a novel view image, the pixels’ color of the novel image are often from different input images, which means we should blend them for the final rendering result

^{[48]}.

^{[49,50]}. Unlike methods we mentioned above, it provides end-to-end rendering process, that is, directly rendering the image of novel view from the discretely sampled input images. For example, DeepStereo uses a large number of real scene photos as training set to train a depth prediction network and a color prediction network, so that people can obtain rendering result directly from input pixels

^{[50]}. This method can deal with the wide-baseline case, and due to the ability of deep network to fit very complex nonlinear functions, some problems which traditional methods are difficult to handle can be better processed.

Reference

Snavely N, Seitz S M, Szeliski R. Photo Tourism: Exploring image collections in 3D. ACM Transactions on Graphics, 2006, 25: 835–846 DOI:10.1145/1141911.1141964

Furukawa Y, J. Accurate Ponce, dense, and robust multi-view stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2010, 32(8): 1362–1376 DOI:10.1109/TPAMI.2009.161

Google Street View. https: //www. google. com/streetview/

Shum H Y, Kang S B. A review of image-based rendering techniques. In: NganK N, SikoraT, SunM T, eds. Proceedings of IEEE/SPIE Visual Communications and Image Processing (VCIP), Perth. Australia, 2000, 2–13

Adelson E H, Bergen J R. The plenoptic function and the elements of early vision. In: Landy M S, Movshon J A, eds. Computational Models of Visual Processing. Cambridge: MIT Press, 1991

Wong T T, Heng P A, Or S H, Ng W Y. Image-based rendering with controllable illumination. In: Dorsey J, Slusallek P, eds. Proceedings of the 8-th Eurographics Workshop on Rendering. Berlin: Springer-Verlag, 1997, 13–22

McMillan L, Bishop G. Plenoptic modeling: An image-based rendering system. In: Mair SG, Cook R, eds. Proceedings of the 22nd annual conference on Computer graphics and interactive techniques. New York: ACM, 1995, 39–46

Levoy M, Hanrahan P. Light field rendering. In: Fujii J, eds. Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. New York: ACM, 1996, 31–42 DOI:10.1145/237170.237199

Shum H Y, He L W. Rendering with concentric mosaics. In: Waggenspack W, eds. Proceedings of the 26th annual conference on Computer graphics and interactive techniques. New York: ACM Press/Addison-Wesley Publishing Co., 1999, 299–306 DOI:10.1145/311535.311573

Szeliski R, Shum H Y. Creating full view panoramic image mosaics and texture-mapped models. In: OwenG S, WhittedT, Mones-HattalB, eds. Proceedings of the 24th annual conference on Computer graphics and interactive techniques. New York: ACM Press/Addison-Wesley Publishing Co., New York, 1997, 251–258 DOI:10.1145/258734.258861

Chen S E. Quick Time VR–an image-based approach to virtual environment navigation. In: Mair SG, Cook R, eds. Proceedings of the 22nd annual conference on Computer graphics and interactive techniques. New York: ACM, 1995, 29–38 DOI:10.1145/218380.218395

Roundshot. http: //www. roundshot. com

Shum H Y, Szeliski R. Construction and refinement of panoramic mosaics with global and local alignment. In: Sixth International Conference on Computer Vision (ICCV’98). Bombay, 1998, 953–958 DOI:10.1109/ICCV.1998.710831

Gortler S J, Grzeszczuk R, Szeliski R, Cohen M F. The lumigraph. In: Fujii J, eds. Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. New York: ACM, 1996, 43–54 DOI:10.1145/237170.237200

Debevec P, Downing G, Bolas M, Peng H-Y, Urbach J. Spherical light field environment capture for virtual reality using a motorized pan/tilt head and offset camera. In: ACM Siggraph 2015 Posters. Los Angeles, California, 2015, 1–1 DOI:10.1145/2787626.2787648

Buehler C, Bosse M, McMillan L, Gortler S, Cohen M. Unstructured lumigraph rendering. In: Pocock L, eds. Proceedings of the 28th annual conference on Computer graphics and interactive techniques. New York: ACM, 2001, 425–432 DOI:10.1145/383259.383309

Rademacher P. View-dependent geometry. In: Waggenspack W, eds. Proceedings of the 26th annual conference on Computer graphics and interactive techniques. New York: ACM Press/Addison-Wesley Publishing Co., 1999, 439–446 DOI:10.1145/311535.311612

Vedula S, Baker S, Kanade T. Spatio-temporal view interpolation. In: Gibson S, Debevec P, eds. Proceedings of the 13th Eurographics Workshop on Rendering. Aire-la-Ville: Eurographics Association, 2002

Chaurasia G, Duchene S, Sorkine-Hornung O, Drettakis G. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics, 2013, 32(3): 1–12 DOI:10.1145/2487228.2487238

Lipski C, Klose F, Magnor M. Correspondence and depth-image based rendering a hybrid approach for free-viewpoint video. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(6): 942–951 DOI:10.1109/TCSVT.2014.2302379

Shade J, Gortler S, He L W, Szeliski R. Layered depth images. In: CunninghamS, BransfordW, CohenM F, eds. Proceedings of the 25th annual conference on Computer graphics and interactive techniques. New York: ACM, 1998, 231–242. DOI:10.1145/280814.280882

Chang C, Bishop G, Lastra A. LDI tree: A hierarchical representation for image-based rendering. In: Waggenspack W, eds. Proceedings of the 26th annual conference on Computer graphics and interactive techniques. New York: ACM Press/Addison-Wesley Publishing Co., 1999: 291–298 DOI:10.1145/311535.311571

Penner E, Zhang L. Soft 3D reconstruction for view synthesis. ACM Transactions on Graphics, 2017, 36(6): 1–11 DOI:10.1145/3130800.3130855

Chen S, Williams L. View interpolation for image synthesis. In: Whitton MC, eds. Proceedings of the 20th annual conference on Computer graphics and interactive techniques. New York: ACM, 1993, 279–288 DOI:10.1145/166117.166153

Seitz S M, Dyer C M. View morphing. In: Fujii J, eds. Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. New York: ACM, 1996, 21–30 DOI:10.1145/237170.237196

Nie Y W, Zhang Z S, Sun H Q, Su T, Li G. Homography propagation and optimization for wide-baseline street image interpolation. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(10): 2328–2341 DOI:10.1109/TVCG.2016.2618878

Kopf J, Langguth F, Scharstein D, Szeliski R, Goesele M. Image-based rendering in the gradient domain. ACM Transactions on Graphics, 2013, 32(6): 1–9 DOI:10.1145/2508363.2508369

Sinha S N, Kopf J, Goesele M, Scharstein D. Image-based rendering for scenes with reflections. ACM Transactions on Graphics, 2012, 31(4): 1–10

Thonat T, Djelouah A, Durand F, Drettakis G. Thin structures in image based rendering. Computer Graphics Forum, 2018, 37: 107–118

Yang H S, Lin W Y, Lu J B. DAISY Filter Flow: A generalized discrete approach to dense correspondences. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, 2014, 3406–3413

Bao L C, Yang Q X, Jin H L. Fast edge-preserving PatchMatch for large displacement optical flow. IEEE Transactions on Image Processing, 2014, 23: 4996–5006 DOI:10.1109/CVPR.2014.452

Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11): 2274–2282 DOI:10.1109/TPAMI.2012.120

Fischler M A, Bolles R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM, 1981, 24(6): 381–395 DOI:10.1145/358669.358692

Hawe S, Kleinsteuber M, Diepold K. Dense disparity maps from sparse disparity measurements. In: 2011 International Conference on Computer Vision. Barcelona, 2011, 2126–2133 DOI:10.1109/ICCV.2011.6126488

Sinha S N, Steedly D, Szeliski R. Piecewise planar stereo for image-based rendering. In: 2009 IEEE 12th International Conference on Computer Vision. Kyoto, 2009, 1881–1888

Gallup D, Frahm J, Pollefeys M. Piecewise planar and non-planar stereo for urban scene reconstruction. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, 2010, 1418–1425

Furukawa Y, Curless B, Seitz S M, Szeliski R. Manhattan-world stereo. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, 2009, 1422–1429 DOI:10.1109/CVPR.2009.5206867

Lipski C, Linz C, Neumann T, Wacker M, Magnor M. High resolution image correspondences for video post-produc-tion. In: 2010 Conference on Visual Media Production. London, 2010, 33–39 DOI:33-39.10.1109/CVMP.2010.12

Liu C, Yuen J, Torralba A. SIFT flow: Dense correspondence across different scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5): 978–994 DOI:10.1109/TPAMI.2010.147

Hosni A, Bleyer M, Rhemann C, Gelautz M, Rother C. Real-time local stereo matching using guided image filtering. In: 2011 IEEE International Conference on Multimedia and Expo. Barcelona, 2011, 1–6 DOI:10.1109/ICME.2011.6012131

Ma Z Y, He K M, Wei Y C. Constant time weighted median filtering for stereo matching and beyond. In: 2013 IEEE International Conference on Computer Vision. Sydney, NSW, 2013, 49–56

He K M, Sun J, Tang X O. Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013, 35: 1397–1409 DOI:10.1109/TPAMI.2012.213

Hedman P, Alsisan S, Szeliski R, Kopf J. Casual 3D photography. ACM Transactions on Graphics, 2017, 36: 234: 1–15

Schönberger J L, Zheng E L, Frahm J M. Pixelwise view selection for unstructured multi-view stereo. In: European Conference on Computer Vision. Amsterdam, the Netherlands, 2016

Hedman P, Kopf J. Instant 3D photography. ACM Transactions on Graphics, 2017, 36(6): 1–15 DOI:10.1145/3130800.3130828

Hedman P, Ritschel T, Drettakis G, Brostow G. Scalable inside-out image-based rendering. ACM Transactions on Graphics, 2016, 35(6): 1–11. DOI:10.1145/2980179.2982420

Pérez P, Gangnet M, Blake A. Poisson image editing. ACM Transactions on Graphics, 2003, 22: 313–318 DOI:10.1145/1201775.882269

Burt P J, Adelson E H. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics, 1983, 2: 217–236 DOI:10.1145/245.247

Kalantari N K, Wang T C, Ramamoorthi R. Learning based view synthesis for light field cameras. ACM Transactions on Graphics, 2016, 35(6): 1–10 DOI:10.1145/2980179.2980251

Flynn J, Neulander I, Philbin J, Snavely N. DeepStereo: Learning to predict new views from the world’s imagery. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, 2016, 35(6): 1–10 DOI:10.1145/2980179.2980251