6-DoF Object Pose from Semantic Keypoints
Georgios Pavlakos, Xiaowei Zhou, Aaron Chan, Konstantinos G. Derpanis, Kostas Daniilidis
International Conference on Robotics and Automation (ICRA), 2017
project page / code / video / bibtex
In this paper we propose a global optimization-based approach to jointly matching a set of images. The estimated correspondences simultaneously maximize pairwise feature affinities and cycle consistency across multiple images. Unlike previous convex methods relying on semidefinite programming, we formulate the problem as a low-rank matrix recovery problem and show that the desired semidefiniteness of a solution can be spontaneously fulfilled. The low-rank formulation enables us to derive a fast alternating minimization algorithm in order to handle practical problems with thousands of features. Both simulation and real experiments demonstrate that the proposed algorithm can achieve a competitive performance with an order of magnitude speedup compared to the state-of-the-art algorithm. In the end, we demonstrate the applicability of the proposed method to match the images of different object instances and as a result the potential to reconstruct category-specific object models from those images.
Multi-Image Matching via Fast Alternating Minimization.
X. Zhou, M. Zhu, K. Daniilidis.
International Conference on Computer Vision (ICCV), 2015.
Supplementary material: PDF
The MATLAB code for Algorithm 1 in the paper.
We introduce a new approach for estimating a fine grained 3D shape and continuous pose of an object from a single image. Given a training set of view exemplars, we learn and select appearance-based discriminative parts which are mapped onto the 3D model through a facility location optimization. The training set of 3D models is summarized into a set of basis shapes from which we can generalize by linear combination. Given a test image, we detect hypotheses for each part. The main challenge is to select from these hypotheses and compute the 3D pose and shape coefficients at the same time. To achieve this, we optimize a function that considers simultaneously the appearance matching of the parts as well as the geometric reprojection error. We apply the alternating direction method of multipliers (ADMM) to minimize the resulting convex function. Our main and novel contribution is the simultaneous solution for part localization and detailed 3D geometry estimation by maximizing both appearance and geometric compatibility with convex relaxation.
Single Image Pop-Up from Discriminatively Learned Parts.
M. Zhu*, X. Zhou*, K. Daniilidis.
International Conference on Computer Vision (ICCV), 2015.
This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence. Here, two cases are considered: (i) the image locations of the human joints are provided and (ii) the image locations of joints are unknown. In the former case, a novel approach is introduced that integrates a sparsity-driven 3D geometric prior and temporal smoothness. In the latter case, the former case is extended by treating the image locations of the joints as latent variables. A deep fully convolutional network is trained to predict the uncertainty maps of the 2D joint locations. The 3D pose estimates are realized via an Expectation-Maximization algorithm over the entire sequence, where it is shown that the 2D joint location uncertainties can be conveniently marginalized out during inference. Empirical evaluation on the Human3.6M dataset shows that the proposed approaches achieve greater 3D pose estimation accuracy over state-of-the-art baselines. Further, the proposed approach outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.
Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video.
X. Zhou, M. Zhu, S. Leonardos, K. Derpanis, K. Daniilidis.
Updated package that includes the whole pipeline for reconstructing 3D human poses from an image sequence
including the proposed reconstruction algorithm + the “Stacked Hourglass Network” for 2D pose detection.
We investigate the problem of estimating the 3D structure of an object defined by a set of 3D landmarks, given their 2D correspondences in a single image. To alleviate the reconstruction ambiguity, a widely used approach is to assume the unknown structure as a linear combination of predefined basis shapes and the sparse representation is usually adopted to capture complex shape variability. While this approach has proven to be successful in many applications, a challenging issue remains, i.e., the joint estimation of structure and viewpoint requires to solve a nonconvex optimization problem. Previous methods often adopt an alternating minimization scheme to alternately update the structure and viewpoint, and the solution depends on initialization and might be stuck at local optimum. In this paper, we propose a convex approach to addressing this issue and develop an efficient algorithm to solve the proposed convex program. Moreover, we propose a robust model to handle gross errors in the 2D correspondences. We demonstrate the exact recovery property of the proposed model, its merits compared to alternative methods and the applicability to recover 3D human poses and car shapes from real images.
Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach.
X. Zhou, M. Zhu, S. Leonardos, K. Daniilidis.
3D Shape Estimation from 2D Landmarks: A Convex Relaxation Approach.
X. Zhou, S. Leonardos, X. Hu, K. Daniilidis.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
Matlab code — the MATLAB implementation of the algorithms introduced in the journal version of our work and several demonstration examples.
Low-rank modeling generally refers to a class of methods that solves problems by representing variables of interest as low-rank matrices. It has achieved great success in various fields including computer vision, data mining, signal processing, and bioinformatics. Recently, much progress has been made in theories, algorithms, and applications of low-rank modeling, such as exact low-rank matrix recovery via convex programming and matrix completion applied to collaborative filtering. These advances have brought more and more attention to this topic. In this article, we review the recent advances of low-rank modeling, the state-of-the-art algorithms, and the related applications in image analysis. We first give an overview of the concept of low-rank modeling and the challenging problems in this area. Then, we summarize the models and algorithms for low-rank matrix recovery and illustrate their advantages and limitations with numerical experiments. Next, we introduce a few applications of low-rank modeling in the context of image analysis. Finally, we conclude this article with some discussions.
A comparison of some matrix completion solvers (distance to ground truth vs. time).
Low-Rank Modeling and its Applications in Image Analysis.
X. Zhou, C. Yang, H. Zhao, W. Yu.
ACM Computing Surveys, 47(2): 36, 2014.
The MATLAB codes generating the figures in the paper is available through this link.
Active contours are widely used in image segmentation. To cope with missing or misleading features in images, re- searchers have introduced various ways to model the prior of shapes and use the prior to constrain active contours. However, the shape prior is usually learnt from a large set of annotated data, which is not always accessible in practice. Moreover, it is often doubted that the existing shapes in the training set will be sufficient to model the new instance in the testing image. In this paper, we propose to use the group similarity of object shapes in multiple images as a prior to aid segmentation, which can be interpreted as an unsupervised approach of shape prior modeling. We show that the rank of the matrix consisting of multiple shapes is a good measure of the group similarity of the shapes, and the nuclear norm minimization is a simple and effective way to impose the proposed constraint on existing active contour models. Moreover, we develop a fast algorithm to solve the proposed model by using the accelerated proximal method. Experiments using echocardiographic image sequences acquired from acute canine experiments demonstrate that the proposed method can consistently improve the performance of active contour models and increase the robustness against image defects such as missing boundaries.
Top: without the shape constraint. Bottom: with the shape constraint.
Active Contours with Group Similarity.
X. Zhou, X. Huang, J.S. Duncan, W. Yu.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
The MATLAB codes can be found here.
Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifier, while background subtraction needs a training sequence that contains no objects to build a background model. To automate the analysis, object detection without a separate training phase becomes a critical task. People have tried to tackle this task by using motion information. But existing motion-based methods are usually limited when coping with complex scenarios such as nonrigid motion and dynamic background. In this paper, we show that the above challenges can be addressed in a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR). This formulation integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm efficiently. We explain the relations between DECOLOR and other sparsity-based methods. Experiments on both simulated data and real sequences demonstrate that DECOLOR outperforms the state-of-the-art approaches and it can work effectively on a wide range of complex scenarios.
From top to bottom: original image, background, segmentation
Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation.
X. Zhou, C. Yang, W. Yu.
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2013.
Updated on May 25, 2016.
The newest version of the GCO toolbox is included to solve the compatibility issue with the new versions of MATLAB.
Only the mex file for Win64 is included. Please compile the GCO toolbox if you are using another system.