Home About the Journal Latest Work Current Issue Archive Special Issues Editorial Board

3D visual processing and reconstruction

3D sensing represents the main channel through which humans, or robotics agents, understand and interact with each other and with the real world. As such, many 3D acquisition technologies and devices have been developed and applied in emerging applications, such as autonomous systems, augmented reality and digital production. A typical 3D visual system takes RGB and/or range images of an object or scene and generates 3D geometry. There are several classic solutions for different settings, for example, structure from motion (SfM) for scattered images, Simultaneous Localization and Mapping (SLAM) for structured images along temporal axis. A research trend has been introducing deep learning into many conventional operations, such as pose estimation, spatial computation, and scene recognition. Besides 3D geometry, modeling of texture, material and lighting properties are also part of 3D visual processing.
View Abstracts Download Citations pdf Download E-Journal


Reference Manager





3D visual processing and reconstruction

2020, 2(3) : 1-2


PDF (35) HTML (993)


Urban 3D modeling using mobile laser scanning: a review

2020, 2(3) : 175-212


Abstract (1303) PDF (27) HTML (1710)
Mobile laser scanning (MLS) systems mainly comprise laser scanners and mobile mapping platforms. Typical MLS systems can acquire three-dimensional point clouds with 1-10cm point spacings at a normal driving or walking speed in streets or indoor environments. The efficiency and stability of these systems make them extremely useful for application in three-dimensional urban modeling. This paper reviews the latest advances of the LiDAR-based mobile mapping system (MMS) point cloud in the field of 3D modeling, including LiDAR simultaneous localization and mapping, point cloud registration, feature extraction, object extraction, semantic segmentation, and processing using deep learning. Furthermore, typical urban modeling applications based on MMS are also discussed.
Summary study of data-driven photometric stereo methods

2020, 2(3) : 213-221


Abstract (1113) PDF (10) HTML (1143)
A photometric stereo method aims to recover the surface normal of a 3D object observed under varying light directions. It is an ill-defined problem because the general reflectance properties of the surface are unknown.
This paper reviews existing data-driven methods, with a focus on their technical insights into the photometric stereo problem. We divide these methods into two categories, per-pixel and all-pixel, according to how they process an image. We discuss the differences and relationships between these methods from the perspective of inputs, networks, and data, which are key factors in designing a deep learning approach.
We demonstrate the performance of the models using a popular benchmark dataset.
Data-driven photometric stereo methods have shown that they possess a superior performance advantage over traditional methods. However, these methods suffer from various limitations, such as limited generalization capability. Finally, this study suggests directions for future research.
Deep learning based point cloud registration: an overview

2020, 2(3) : 222-246


Abstract (1304) PDF (28) HTML (2593)
Point cloud registration aims to find a rigid transformation for aligning one point cloud to another. Such registration is a fundamental problem in computer vision and robotics, and has been widely used in various applications, including 3D reconstruction, simultaneous localization and mapping, and autonomous driving. Over the last decades, numerous researchers have devoted themselves to tackling this challenging problem. The success of deep learning in high-level vision tasks has recently been extended to different geometric vision tasks. Various types of deep learning based point cloud registration methods have been proposed to exploit different aspects of the problem. However, a comprehensive overview of these approaches remains missing. To this end, in this paper, we summarize the recent progress in this area and present a comprehensive overview regarding deep learning based point cloud registration. We classify the popular approaches into different categories such as correspondences-based and correspondences-free approaches, with effective modules, i.e., feature extractor, matching, outlier rejection, and motion estimation modules. Furthermore, we discuss the merits and demerits of such approaches in detail. Finally, we provide a systematic and compact framework for currently proposed methods and discuss directions of future research.


Interactive free-viewpoint video generation

2020, 2(3) : 247-260


Abstract (1075) PDF (25) HTML (1131)
Free-viewpoint video (FVV) is processed video content in which viewers can freely select the viewing position and angle. FVV delivers an improved visual experience and can also help synthesize special effects and virtual reality content. In this paper, a complete FVV system is proposed to interactively control the viewpoints of video relay programs through multimedia terminals such as computers and tablets.
The hardware of the FVV generation system is a set of synchronously controlled cameras, and the software generates videos in novel viewpoints from the captured video using view interpolation. The interactive interface is designed to visualize the generated video in novel viewpoints and enable the viewpoint to be changed interactively.
Experiments show that our system can synthesize plausible videos in intermediate viewpoints with a view range of up to 180°.
Visual perception driven 3D building structure representa-tion from airborne laser scanning point cloud

2020, 2(3) : 261-275


Abstract (1191) PDF (23) HTML (986)
Three-dimensional (3D) building models with unambiguous roof plane geometry parameters, roof structure units, and linked topology provide essential data for many applications related to human activities in urban environments. The task of 3D reconstruction from point clouds is still in the development phase, especially the recognition and interpretation of roof topological structures.
This study proposes a novel visual perception-based approach to automatically decompose and reconstruct building point clouds into meaningful and simple parametric structures, while the associated mutual relationships between the roof plane geometry and roof structure units are expressed by a hierarchical topology tree. First, a roof plane extraction is performed by a multi-label graph cut energy optimization framework and a roof structure graph (RSG) model is then constructed to describe the roof topological geometry with common adjacency, symmetry, and convexity rules. Moreover, a progressive roof decomposition and refinement are performed, generating a hierarchical representation of the 3D roof structure models. Finally, a visual plane fitted residual or area constraint process is adopted to generate the RSG model with different levels of details.
Two airborne laser scanning datasets with different point densities and roof styles were tested, and the performance evaluation metrics were obtained by International Society for Photogrammetry and Remote Sensing, achieving a correctness and accuracy of 97.7% and 0.29m, respectively.
The standardized assessment results demonstrate the effectiveness and robustness of the proposed approach, showing its ability to generate a variety of structural models, even with missing data.
Neural hand reconstruction using an RGB image

2020, 2(3) : 276-289


Abstract (1128) PDF (18) HTML (1060)
This study presents a neural hand reconstruction method for monocular 3D hand pose and shape estimation.
Alternate to directly representing hand with 3D data, a novel UV position map is used to represent a hand pose and shape with 2D data that maps 3D hand surface points to 2D image space. Furthermore, an encoder-decoder neural network is proposed to infer such UV position map from a single image. To train this network with inadequate ground truth training pairs, we propose a novel MANOReg module that employs MANO model as a prior shape to constrain high-dimensional space of the UV position map.
The quantitative and qualitative experiments demonstrate the effectiveness of our UV position map representation and MANOReg module.