3D Model Retrieval

Triplanar convolution with shared 2D kernels for 3D classification and shape retrieval

Kim E.Y Shin S.Y Lee S. Lee K.J Kyoung H.L Lee K.M

Background

Increasing the depth of Convolutional Neural Networks (CNNs) has been recognized to provide better generalization performance. However, in the case of 3D CNNs, stacking layers increases the number of learnable parameters linearly, making it more prone to learn redundant features.

Our Contribution

We present a novel 3D CNN structure that learns shared 2D triplanar features viewed from the three orthogonal planes, which we term S3PNet. Due to the reduced dimension of the convolutions, the proposed S3PNet is able to learn 3D representations with substantially fewer learnable parameters.

Figure 1. Overview of Triplanar CNN (S3PNet) for 3D shape recognition. 3D data is first converted into binary voxel grids, which is then passed through the S3PNet which consists of a series of planar modules. Each module consists of three branches of 2D convolutions that are iteratively applied to every cross-section of 3D input volume from different views.

Figure 2. The architecture of a single triplanar module. A 3D input volume is first decomposed into its respective cross-sections, then passed to each branch in a triplanar convolution. (a) 2D convolutions are applied to every cross-section of input volume on three different orthogonal views, namely, the 𝑥𝑦-plane, 𝑦𝑧-plane, and 𝑧𝑥-plane, respectively. Colored blocks represent the output feature-maps in each plane, red for the feature-maps in 𝑥𝑦-plane, blue in 𝑦𝑧-plane, and orange in 𝑧𝑥-plane. (b) Generated output feature-maps are then aggregated to form a single combined representation. After aggregating the feature-maps 1 × 1 convolution is applied to reduce feature redundancy and increase the compactness of the model.

Table 1. Architecture for S3PNet. Triplanar modules (TM) and their respective components are shown in detail. Unless otherwise specified, stride of convolution is set to 1. We use 3D max pooling to reduce the resolution of the feature volume in its entirety. The numbers of output feature-maps of each triplanar module are 32, 32, and 128, respectively. Last fully-connected layer is replaced by 1 × 1 convolution in the experiment. Our final model is composed of three triplanar modules and a prediction module.

Results

Experimental evaluations show that the combination of 2D representations on the different orthogonal views learned through the S3PNet is sufficient and effective for 3D representation, with the results outperforming current methods based on fully 3D CNNs.

Table 2. Comparison to the state-of-the-art methods on three classification benchmarks. We report the overall classification results on all datasets, except for Sydney Urban dataset where we report the weighted average over F1 score. The numbers marked with † were calculated by considering only convolutional layers due to the unspecified sizes of fully-connected layers or the usage of SVM as a classifier.

Table 3. ShapeNetCore55 Shape Retrieval. Comparison to the state-of-the-art methods on five evaluation metrics. All metrics are evaluated on micro-averaged and macro-averaged scores on (a) ‘‘normal’’ and (b) ‘‘perturbed’’ test sets. Only published papers are cited. Please refer to the full track report of Large-Scale 3D Shape Retrieval from ShapeNet Core55 (SHREC’17) for more details of other approaches.

Table 4. LUNA16. Table showing scores of top 5 entries including S3PNet and the baseline 3D CNN, CUMedVis.

Acknowledgement

This work was supported by the Interdisciplinary Research Initiatives Program from College of Engineering and College of Medicine, Seoul National University (grant no. 800-20170169) and the National Research Foundation of Korea (NRF) grants funded by the Korean government (MSIT) (No. NRF-2017R1A2B2011862).

Paper

Triplanar convolution with shared 2D kernels for 3D classification and shape retrieval

Kim, E.Y.; Shin, S.Y; Lee, S.; Lee, K.J.; Lee, K.H.; Lee K.M.

Computer Vision and Image Understanding, Volume 193, April 2020, 102901.

[link] [pdf] [Bibtex]

collaboration

Developed in collaboration with researchers from [Department of ECE, Automation and Systems Research Institute, Seoul National University], [Seoul National University Bundang Hospital]